LaVague's technology comprises several key components, including the World Model and Action Engine, which work together to automate tasks based on user-defined objectives.
World Model: The World Model is responsible for converting the user's global objective into the next instruction for the Action Engine to carry out based on visual and textual information1. It uses a multi-modal model to perform this conversion. By default, LaVague uses GPT-4o, but developers can use any LlamaIndex multi-modal LLM.
Action Engine: This component receives a natural language text instruction and generates the action needed to carry out the instruction. In the example shared in the web search context, the Action Engine performs Retrieval-Augmented Generation (RAG) and generates the code for the action using the default embedding and Large Language Models (OpenAI's text-embedding-3-large & GPT-4o).
Driver: The Driver component is used to perform actions on web browsers and get information about the current web page. LaVague currently provides a Selenium Driver component by default, as well as a Playwright Driver option.
Web Agent: The Web Agent brings all of these components together and can be used to perform tasks defined by the user's objective argument with the run
method or to launch an interactive interface with Gradio with the demo()
mode.
Here's a simplified workflow of how these components function together:
This architecture allows LaVague to automate tasks such as logging into an account and changing a username, as demonstrated in the example provided.
Developers can use LaVague to create an AI agent for retrieving the latest research papers on Hugging Face by following these steps:
pip install lavague
from lavague.core import WorldModel, ActionEngine
from lavague.core.agents import WebAgent
from lavague.drivers.selenium import SeleniumDriver
selenium_driver = SeleniumDriver(headless=False)
world_model = WorldModel()
action_engine = ActionEngine(selenium_driver)
agent = WebAgent(world_model, action_engine)
agent.get("https://huggingface.co/docs")
agent.run("Go on the quicktour of PEFT")
LaVague's default OpenAI API configuration requires the OPENAI_API_KEY
variable to be set in the local environment with a valid API key. This example demonstrates how LaVague can be used to create an AI agent that retrieves the latest research papers on Hugging Face.
The LaVague framework is an open-source Large Action Model (LAM) developed by Mithril Security that aims to revolutionize the building and sharing of AI Web Agents2. It offers a simplified yet powerful approach to creating and deploying AI agents, making it accessible to developers of varying skill levels2. LaVague's core components include a World Model, which translates objectives and current web states into executable instructions, and an Action Engine, which compiles these instructions into action code1. This setup enables LaVague agents to execute tasks autonomously on the web, significantly lowering the barrier to entry for AI agent development.
LaVague's contribution to the field of AI-driven information retrieval lies in its ability to automate mundane tasks through AI, making everyday workflows more efficient. By leveraging LaVague, developers can create powerful, community-shared AI agents with just a few lines of code, offering unparalleled performance in retrieving up-to-date information. Furthermore, LaVague's potential extends beyond public data retrieval, as it enables the creation of agents that can access and utilize private data from various SaaS tools such as Notion and Salesforce. This feature opens up numerous possibilities for automating tasks involving sensitive and proprietary information, making LaVague an invaluable tool for personal and professional use.