Skip to content

webtask

PyPI version Tests License: MIT Documentation

Easy-to-use LLM-powered browser automation — from autonomous tasks to element-level control.

Why webtask?

  • High-level tasks: Describe what you want done — the agent figures out the steps
  • Low-level control: Select any element with natural language — no CSS/XPath selectors needed

Quick Start

from webtask import Webtask
from webtask.integrations.llm import Gemini

wt = Webtask()
agent = await wt.create_agent(
    llm=Gemini(model="gemini-2.5-flash"),
    wait_after_action=1.0,
)

await agent.goto("https://practicesoftwaretesting.com")
await agent.wait(3)

# select: pick elements with natural language
search = await agent.select("the search input")
await search.fill("pliers")

# do: simple or complex tasks — agent figures out the steps
await agent.do("click search and add the first product to cart")

# extract: get structured data from the page
price = await agent.extract("the cart total price")

# verify: check conditions
assert await agent.verify("cart has 1 item")

Features

Four core operations

await agent.do("click search and add first product to cart")  # Autonomous tasks
element = await agent.select("the search input")              # Element selection
data = await agent.extract("the cart total", MySchema)        # Data extraction
assert await agent.verify("cart has 1 item")                  # Verification

Stateful agents — Agent remembers context across tasks

await agent.do("Add pliers to cart")
await agent.do("Add a hammer too")  # Remembers previous action
agent.clear_history()               # Reset when needed

Two modes — DOM-based or pixel-based interaction

agent = await wt.create_agent(llm=llm, mode="dom")     # Element IDs (default)
agent = await wt.create_agent(llm=llm, mode="pixel")   # Screen coordinates

Browser integration — Works with new or existing browsers

agent = await wt.create_agent(llm=llm)                                  # New browser
agent = await wt.create_agent_with_browser(llm=llm, browser=browser)    # Existing browser
agent = wt.create_agent_with_context(llm=llm, context=context)          # Existing context
agent = wt.create_agent_with_page(llm=llm, page=page)                   # Existing page

Error handling — Handle task failures gracefully

try:
    await agent.do("Add item to cart")
except TaskAbortedError as e:
    print(f"Task failed: {e}")

Get Started