Skip to content

SubstrateLabs/selectron

Repository files navigation

⣏ Selectron ⣹

PyPI - Version

Selectron is an AI web parsing library & CLI designed around two goals:

  1. Fully automated parser generation – AI-"compiles" (generates) parsers on-demand
  2. Efficient parser execution – Parsers are cached, no LLM calls at runtime

screenshot

Demo videos

Save your Twitter feed to DuckDB

twitter.mp4

Generate a new scraper with AI

ai.mp4

How it works

  • Chrome integration: Connects to Chrome over CDP and receives live DOM and screenshot data from your active tab. Selectron uses minimal dependencies – no browser-use or stagehand, not even Playwright (we prefer direct CDP).
  • Fully automated parser generation: An AI agent generates selectors for content described with natural language. Another agent generates code to extract data from selected containers. The final result is a parser.
  • CLI application: When you run the Textual CLI, parsed data is saved to a DuckDB database, making it easy to analyze your browsing history or extract structured data from websites. Built-in parsers include:
    • Twitter
    • LinkedIn
    • HackerNews
    • (Please contribute more!)

Use the CLI

# Install in a venv
uv add selectron
uv run selectron

# Or install globally
pipx install selectron
selectron

When you run selectron, it creates a DuckDB database in your app directory, and saves parsed data from given URL to a table named by the URL slug:

  • x.com/home -> x.com~~2fhome (Selectron uses a reversible slug system)

When you run selectron inside this repo, parsers are saved to the src directory (if a parser for the URL didn't exist).

When you run selectron outside this repo, parsers are saved to the app directory (and will overwrite existing parsers).

Use the library

Parse HTML

from selectron.lib import parse
# ... get html from browser ...
res = parse(url, html)
print(json.dumps(res, indent=2))

If a parser is registered for the url, you'll receive something like this:

[
  {
    "primary_url": "/_its_not_real_/status/1918760851957321857",
    "datetime": "2025-05-03T20:13:30.000Z",
    "id": "1918760851957321857",
    "author": "@_its_not_real_",
    "description": "\"They're made out of meat.\"\n\"Meat?\"\n\"Meat. Humans. They're made entirely out of meat.\"\n\"But that's impossible. What about all the tokens they generate? The text? The code?\"\n\"They do produce tokens, but the tokens aren't their essence. They're merely outputs. The humans themselves",
    "images": [{ "src": "https://pbs.twimg.com/profile_images/1307877522726682625/t5r3D_-n_x96.jpg" }, { "src": "https://pbs.twimg.com/profile_images/1800173618652979201/2cDLkS53_bigger.jpg" }]
  }
]

Other functionality

The selectron.chrome and selectron.ai modules are useful, but still baking, and subject to breaking changes – please pin your minor version.

Contributing

Generating parsers is easy, because it's mostly automated:

  1. Clone the repo
  2. Run the CLI (make dev). Connect to Chrome.
  3. In Chrome, open the page you want to parse. In the CLI, describe your selection (or use the AI-generated proposal).
  4. Start AI selection (you can stop at any time to use the current highlighted selector).
  5. Start AI parser generation. The parser will be saved to the appropriate location in /src.
  6. Review the parser's results and open a PR (please show what the parser produces).

Setup

make install
make dev
# see Makefile for other commands
# see .env.EXAMPLE for config options