In-Context Abstraction Learning (ICAL) is a method that helps Visual-Language Models (VLMs) understand and learn from their experiences in different situations, allowing them to adapt and perform better in new tasks4. It guides VLMs to build multimodal abstractions in novel domains, focusing on learning abstractions that encompass tasks' dynamics and critical knowledge, rather than just storing successful action plans or trajectories4.
ICAL (In-Context Abstraction Learning) guides VLMs (Vision-Language Models) by providing noisy demonstrations in new domains. VLMs then abstract these demonstrations into general programs, fixing inefficient actions and annotating cognitive abstractions such as task relationships, object state changes, temporal subgoals, and task construals15. These abstractions are refined through human feedback while the agent attempts to execute the trajectory in a similar environment. The learned abstractions are used as exemplars in the prompt, improving decision-making in retrieval-augmented LLM and VLM agents.
ICAL addresses four types of cognitive abstractions: 1) Task and causal relationships, which reveal underlying principles or actions needed to achieve a goal and the interconnectedness of its elements1. 2) Changes in object states, which show the different shapes or states an object can take1. 3) Temporal abstractions, which divide tasks into smaller objectives. 4) Task construals, which emphasize important visual aspects within a task.