Verified

Web Extraction

Turn any URL into agent-ready data: clean markdown, contact info, a company description, pricing plans, or one combined page-to-JSON summary.

Web Extraction is a set of page-scraping building blocks for AI agents: url-to-markdown fetches a page and returns clean, markdown-ish text (nav/footer/scripts/cookie-banners stripped); extract-contact-info pulls emails, phones, social links and contact URLs (and follows a same-origin contact page for more signal); extract-company-description returns a short description from og/meta or the first real paragraph; extract-pricing heuristically parses plans (name, price, currency, billing period, features) from a pricing page; page-to-agent-json combines title, summary, links, prices and contacts in a single fetch. All heuristic (regex/DOM cleanup, no LLM pass in v1). Pay per request via x402; no API key needed from the caller.

Base URL
https://gateway.apiosk.com/web-extraction
Endpoints
POST /extract-company-description $0.07
Return a short company description from a homepage: og:description > meta description > first meaningful paragraph. Cleaned, single-line, truncated to ~280 chars. Never invents text that is not on the page.
POST /extract-contact-info $0.07
Extract emails, phone numbers, social links and contact URLs from a page. Known placeholder emails are filtered; a same-origin contact page is fetched for extra signal and merged (deduplicated).
POST /extract-pricing $0.07
Heuristically parse pricing plans (name, price, currency, billing period, features) from a pricing page. Each plan gets its own detected currency. Returns plans [] with confidence 0 when nothing matches. Best-effort, not a guaranteed parser for every layout.
POST /page-to-agent-json $0.07
One combined, agent-readable summary of a page — title, short summary, links, prices and contacts — assembled from a SINGLE fetch (the most economical of the five). entities is [] in v1 (no entity recognition yet).
POST /url-to-markdown $0.07
Fetch a URL, strip noise (script/style/nav/footer/cookie banners) and return the page title plus markdown-ish text (headings as #..######, paragraphs/list-items as lines). Not a full HTML-to-MD conversion.