Agent
Main interface for web automation.
Constructor
Agent(
llm: LLM,
context: Context,
mode: str = "dom",
wait_after_action: float = 1.0,
typing_delay: float = 0.05
)
Parameters:
- llm - LLM instance for reasoning
- context - Browser context
- mode - Agent mode: "dom" (element IDs) or "pixel" (screen coordinates)
- wait_after_action - Default wait time after each action in seconds (default: 1.0)
- typing_delay - Delay between keystrokes in seconds (default: 0.05)
Methods
do()
async def do(
task: str,
max_steps: int = 20,
wait_after_action: Optional[float] = None,
mode: Optional[str] = None,
files: Optional[List[str]] = None,
output_schema: Optional[Type[BaseModel]] = None,
) -> Result
Execute a task with natural language.
Parameters:
- task - Task description
- max_steps - Maximum steps to execute (default: 20)
- wait_after_action - Wait time after each action (uses agent default if not specified)
- mode - Agent mode override: "dom" or "pixel" (uses agent default if not specified)
- files - Optional list of file paths for upload
- output_schema - Optional Pydantic model for structured output
Returns: Result with optional output and feedback
Raises: TaskAbortedError if task is aborted
Example:
result = await agent.do("Add 2 screws to the cart")
print(result.feedback)
# With structured output
from pydantic import BaseModel
class ProductInfo(BaseModel):
name: str
price: float
result = await agent.do("Extract product info", output_schema=ProductInfo)
print(f"{result.output.name}: ${result.output.price}")
verify()
async def verify(
condition: str,
max_steps: int = 10,
wait_after_action: Optional[float] = None,
mode: Optional[str] = None,
) -> Verdict
Check if a condition is true.
Parameters:
- condition - Condition to check
- max_steps - Maximum steps (default: 10)
- wait_after_action - Wait time after each action (uses agent default if not specified)
- mode - Agent mode override: "dom" or "pixel" (uses agent default if not specified)
Returns: Verdict that can be used as boolean
Raises: TaskAbortedError if verification is aborted
Example:
verdict = await agent.verify("the cart contains 7 items")
if verdict:
print("Success!")
assert verdict == True
extract()
async def extract(
what: str,
output_schema: Optional[Type[BaseModel]] = None,
max_steps: int = 10,
wait_after_action: Optional[float] = None,
mode: Optional[str] = None,
) -> str | BaseModel
Extract information from the current page.
Parameters:
- what - What to extract in natural language
- output_schema - Optional Pydantic model for structured output
- max_steps - Maximum steps (default: 10)
- wait_after_action - Wait time after each action (uses agent default if not specified)
- mode - Agent mode override: "dom" or "pixel" (uses agent default if not specified)
Returns: str if no output_schema provided, otherwise instance of output_schema
Raises: TaskAbortedError if extraction is aborted
Example:
# Simple string extraction
price = await agent.extract("total price")
print(f"Price: {price}")
# Structured extraction
from pydantic import BaseModel
class ProductInfo(BaseModel):
name: str
price: float
in_stock: bool
product = await agent.extract("product information", ProductInfo)
print(f"{product.name}: ${product.price}")
goto()
Navigate to a URL.
Example:
screenshot()
Take a screenshot of the current page.
Parameters:
- path - Optional file path to save screenshot
- full_page - Screenshot the full scrollable page (default: False)
Returns: Screenshot as bytes (PNG format)
Example:
# Save to file
await agent.screenshot("page.png")
# Full page screenshot
await agent.screenshot("full.png", full_page=True)
# Get bytes without saving
screenshot_bytes = await agent.screenshot()
wait()
Wait for a specific amount of time.
Parameters:
- seconds - Number of seconds to wait
Example:
wait_for_load()
Wait for the current page to fully load.
Parameters:
- timeout - Maximum time to wait in seconds (default: 10.0)
- raise_on_timeout - If True, raise TimeoutError on timeout. If False, silently return (default: True)
Raises:
- RuntimeError if no page is active
- TimeoutError if page doesn't load within timeout and raise_on_timeout=True
Example:
await agent.goto("example.com")
await agent.wait_for_load()
# Lenient - continue even if timeout
await agent.wait_for_load(timeout=5.0, raise_on_timeout=False)
wait_for_network_idle()
Wait for network to be idle (no requests for 500ms). Useful for SPAs and pages with AJAX requests.
Parameters:
- timeout - Maximum time to wait in seconds (default: 10.0)
- raise_on_timeout - If True, raise TimeoutError on timeout. If False, silently return (default: True)
Raises:
- RuntimeError if no page is active
- TimeoutError if network doesn't become idle within timeout and raise_on_timeout=True
Example:
await agent.do("Click the search button")
await agent.wait_for_network_idle() # Wait for results to load
# Lenient - continue even if timeout
await agent.wait_for_network_idle(timeout=5.0, raise_on_timeout=False)
clear_history()
Clear conversation history. Resets the agent's memory of previous tasks, starting fresh.
Example:
await agent.do("Add item to cart")
await agent.do("Checkout") # Agent remembers cart context
agent.clear_history() # Start fresh
await agent.do("Search for shoes") # No memory of previous tasks
get_debug_context()
Get the text context that the LLM sees (for debugging). Returns the DOM snapshot and tabs context as a string. Useful for debugging when elements can't be found in text mode.
Returns: The text representation of the current page state
Example:
get_current_page()
Get the current active page.
Returns: Current Page instance, or None if no page is active
Example:
focus_tab()
Focus a specific tab.
Parameters:
- page - Page instance to focus
Example: