A Pragmatist Robot: Learning to Plan Tasks by Experiencing the Real World

Kaixian Qu1, Guowei Lan1, René Zurbrügg1,2, Changan Chen1, Christopher E. Mower3, Haitham Bou-Ammar3,4, Marco Hutter1
1Robotic Systems Lab, ETH Zürich, 2ETH AI Center, 3Huawei Noah's Ark Lab, 4UCL Centre for AI

IEEE Robotics and Automation Letters (RA-L)

Robot completes a new task guided by a long-term memory of self-reflective experiences.

The robot completing challenging tasks using short-term memory and self-reflection

The robot planning correct action sequences after experiencing similar ones

Abstract

Large language models (LLMs) have emerged as the dominant paradigm for robotic task planning using natural language instructions. However, trained on general internet data, LLMs are not inherently aligned with the embodiment, skill sets, and limitations of real-world robotic systems. Inspired by the emerging paradigm of verbal reinforcement learning—where LLM agents improve through self-reflection and few-shot learning without parameter updates—we introduce PragmaBot, a framework that enables robots to learn task planning through real-world experience. PragmaBot employs a vision-language model (VLM) as the robot's “brain” and “eye”, allowing it to visually evaluate action outcomes and self-reflect on failures. These reflections are stored in a short-term memory (STM), enabling the robot to quickly adapt its behavior during ongoing tasks. Upon task completion, the robot summarizes the lessons learned into its long-term memory (LTM). When facing new tasks, it can leverage retrieval-augmented generation (RAG) to plan more grounded action sequences by drawing on relevant past experiences and knowledge. Experiments on four challenging robotic tasks show that STM-based self-reflection increases task success rates from 35% to 84%, with emergent intelligent object interactions. In 12 real-world scenarios (including eight previously unseen tasks), the robot effectively learns from the LTM and improves single-trial success rates from 22% to 80%, with RAG outperforming naive prompting. These results highlight the effectiveness and generalizability of PragmaBot.

Approach

PragmaBot pipeline figure.

At the start of each task, the system takes the user instruction \( \mathbf{I} \) and egocentric observation \( \mathbf{o}_{0} \), which the VLM summarizes into a scenario. RAG retrieves relevant experiences from long-term memory \( \mathbf{M} \) and, together with the instruction and observation, feeds them into the VLM task planner \( \mathcal{T} \). After execution, success is checked by the VLM. If the task is not completed, the action \( \mathbf{a} \) and its feedback \( \mathbf{r} \) are accumulated into short-term memory \( \mathbf{m} \) and fed back into planning. Once the task is completed, the short-term memory \( \mathbf{m} \) is summarized and stored in long-term memory \( \mathbf{M} \) for future use.

Examples

STM Task: "Put the apple on the plate"

LTM Task: "Put the tennis ball on the box"

BibTeX

@article{qu2025pragmatist,
  author    = {Qu, Kaixian and Lan, Guowei and Zurbr{\"u}gg, Ren{\'e} and Chen, Changan and Mower, Christopher E. and Bou-Ammar, Haitham and Hutter, Marco},
  title     = {A Pragmatist Robot: Learning to Plan Tasks by Experiencing the Real World},
  journal   = {IEEE Robotics and Automation Letters},
  year      = {2026},
  note      = {accepted},
}