A Pragmatist Robot: Learning to Plan Tasks by Experiencing the Real World

1Robotic Systems Lab, ETH Zürich, 2ETH AI Center, 3Huawei Noah's Ark Lab, 4UCL Centre for AI

IEEE Robotics and Automation Letters (RA-L), 2026

Robot completes a new task guided by a long-term memory of self-reflective experiences.

Abstract

Large language models (LLMs) have emerged as the dominant paradigm for robotic task planning using natural language instructions. However, trained on general internet data, LLMs are not inherently aligned with the embodiment, skill sets, and limitations of real-world robotic systems. Inspired by the emerging paradigm of verbal reinforcement learning—where LLM agents improve through self-reflection and few-shot learning without parameter updates—we introduce PragmaBot, a framework that enables robots to learn task planning through real-world experience. PragmaBot employs a vision-language model (VLM) as the robot's “brain” and “eye”, allowing it to visually evaluate action outcomes and self-reflect on failures. These reflections are stored in a short-term memory (STM), enabling the robot to quickly adapt its behavior during ongoing tasks. Upon task completion, the robot summarizes the lessons learned into its long-term memory (LTM). When facing new tasks, it can leverage retrieval-augmented generation (RAG) to plan more grounded action sequences by drawing on relevant past experiences and knowledge. Experiments on four challenging robotic tasks show that STM-based self-reflection increases task success rates from 35% to 84%, with emergent intelligent object interactions. In 12 real-world scenarios (including eight previously unseen tasks), the robot effectively learns from the LTM and improves single-trial success rates from 22% to 80%, with RAG outperforming naive prompting. These results highlight the effectiveness and generalizability of PragmaBot.

Video Demonstrations

With Short-Term Memory: Self-Reflection in Action

With Long-Term Memory: Planning Correctly on the First Attempt

Approach

PragmaBot uses a VLM as both the task planner and success detector. When an action fails, the robot reflects on the failure and adapts its plan using short-term memory (STM). Once the task is completed, the experience is summarized and stored in long-term memory (LTM). For future tasks, retrieval-augmented generation (RAG) retrieves relevant past experiences to guide planning from the first attempt.

PragmaBot pipeline overview.

Results

STM Self-Reflection

35% → 84%

Task success rate with short-term memory

LTM + RAG

22% → 80%

Single-trial success rate on 12 scenarios

Generalization

8 / 8

Unseen tasks improved with learned experience

Effect of Short-Term Memory on Task Success

Effect of short-term memory on task success rates comparing CaP-V and PragmaBot
Task CaP-V PragmaBot
Put apple on plate (container obstructs) 43% 86%
Move tiny candy (sponge/towel nearby) 22% 67%
Move egg (open view) 40% 100%
Pick up bowl (apple inside) 33% 83%

Each task is tested 5–10 times with two attempts allowed.

Effect of Long-Term Memory on Single-Trial Success

Effect of long-term memory on single-trial success rates comparing COME and PragmaBot
Task COME PragmaBot
Put apple on plate (container obstructs) 29% 100%
Move tiny candy (towel nearby) 11% 78%
Move egg (open view) 20% 100%
Pick up bowl (apple inside) 17% 83%
Unseen tasks (generalization)
Put tennis ball in box (mug obstructs) 29% 71%
Put orange/ball on plate (fan blocks) 10% 80%
Move crumpled paper (brush nearby) 25% 63%
Move screw (towel nearby) 0% 86%
Move sushi (open view) 14% 71%
Move grape/cherry (open view) 20% 70%
Pick up box (apple on top) 43% 86%
Pick up towel (orange on top) 50% 75%

Each task is tested 5–10 times. Top 4 rows are seen tasks; bottom 8 are unseen tasks demonstrating generalization.

Ablation Study of Memory Retrieval

Radar charts comparing RAG retrieval strategies: RAG with gpt-4o achieves the highest first-action accuracy across tasks.

RAG with gpt-4o achieves the highest first-action accuracy, outperforming both full LTM and random retrieval. gpt-4o-mini also benefits from relevant memories, though gains are less pronounced. Feeding the full LTM increases prompt length by 7.5× at substantially higher cost.

Comparison with Prior Methods

Feature comparison of PragmaBot with prior robot task planning methods
Method Self-reflection Learning by exp. Interactive replan Creative tool use Short-term memory Long-term memory Unified feedback
CaP × × × ×
SayCan × × × ×
Inner Monologue × × × ×
RoboTool × × ×
DROC × × ×
REFLECT × ×
COME × × ×
ReplanVLM × ×
BUMBLE × ×
PragmaBot

Capabilities are shown with ✓/×. Components are shown with ● (present) / ○ (absent). PragmaBot is the only method achieving all listed capabilities.

VLM Conversation Examples

Detailed VLM planning and self-reflection logs for representative tasks.

BibTeX

@article{qu2026pragmatist,
  title={A pragmatist robot: Learning to plan tasks by experiencing the real world},
  author={Qu, Kaixian and Lan, Guowei and Zurbr{\"u}gg, Ren{\'e} and Chen, Changan and Mower, Christopher E and Bou-Ammar, Haitham and Hutter, Marco},
  journal={IEEE Robotics and Automation Letters},
  year={2026},
  publisher={IEEE}
}