The DataSonar API
We built DataSonar because extracting reliable data from the modern web is unnecessarily complicated. Instead of managing fleets of headless browsers, rotating proxies, and writing brittle Puppeteer scripts to solve CAPTCHAs, you can use our API as your single source of truth for web intelligence.
Core Scraping Engine
Our core engine is built on top of a custom Rust-based headless browser. It handles JavaScript rendering, bot mitigation, and proxy rotation completely transparently.
The Universal Scrape Endpoint
POST /v1/scrape This is the workhorse of the platform. Send us a URL, and we'll return the fully rendered HTML or Markdown. Need to click a button, type into a search box, or accept a cookie banner before extracting? Use our Native Actions Pipeline.
The Native Actions Pipeline
Instead of writing custom JavaScript to manipulate the DOM, you can pass an array of human-readable actions. These are compiled directly into our underlying browser engine, executing natively without the overhead of the CDP protocol.
{
"url": "https://example.com/login",
"format": "html",
"stealth": true,
"actions": [
{ "action": "type", "selector": "input[name='email']", "text": "agent@company.com" },
{ "action": "type", "selector": "input[name='password']", "text": "secure123" },
{ "action": "click", "selector": "button[type='submit']" },
{ "action": "scroll_to_bottom" }
],
"js_eval": "document.title"
}
Scale seamlessly with Batch & Async
-
POST /v1/scrape/batchPass an array of up to 100 URLs. We will scrape them concurrently using our worker pool and return a unified result set.
-
POST /v1/scrape/asyncFor massive jobs, queue the work. We'll return a Job ID immediately and ping your webhook when the extraction finishes.
-
POST /v1/scrape/smartNot sure if a page needs full browser rendering? Smart scrape evaluates the target. If it's a static site, we bypass the browser and fetch it instantly over raw HTTP, saving you time and bandwidth.
Data Extraction & Enrichment
Getting the HTML is only half the battle. Our extraction endpoints clean the noise and give you structured, ready-to-use data.
POST /v1/extract/clean Readability Engine
Strips out cookie banners, navigation menus, sidebars, and ads. Returns pure, clean article text perfect for feeding into LLMs.
POST /v1/extract/structured Structured Data Parser
Automatically finds and parses hidden JSON-LD, Microdata, RDFa, Twitter Cards, and OpenGraph tags embedded in the page.
POST /v1/intel/page Deep Page Intelligence
A 4-in-1 endpoint that returns the website's entire technology stack (React, Next.js, Stripe, etc.), high-resolution logos, social media profiles, and RSS feeds.
POST /v1/pdf High-Fidelity Snapshots
Generates a pristine PDF document of any webpage exactly as a human would see it, fully rendered.
Network & Infrastructure Intelligence
Look beneath the surface. Our intelligence endpoints query DNS records, SMTP servers, and certificate authorities directly to profile domain infrastructure.
POST /v1/dns/intelligence
Domain Intelligence
We query MX, TXT, A, and CNAME records to automatically categorize the domain's email provider (e.g. Google Workspace, Outlook) and hosting infrastructure (e.g. AWS, Vercel).
POST /v1/intel/ssl
SSL & Security Profiling
Pulls the complete certificate chain, validating issuers, expiration dates, and all Subject Alternative Names (SANs) secured by the domain.
POST /v1/verify/email
Deep Email Validation
We don't just check syntax. We perform a live SMTP handshake with the recipient's mail server to verify the mailbox exists, checking against disposable email providers and catch-all policies.
Crawling & Discovery
When you need to map out an entire domain, our discovery endpoints run highly parallelized routines to find everything.
Enterprise Site Crawler
POST /v1/crawl Powered by our blazing fast Rust engine, the crawler explores a domain based on your parameters. It respects (or ignores) robots.txt, follows sitemaps, and discovers all internal pages.
POST /v1/intel/sitemap Instant Sitemap Unrolling
Provide a domain and we'll automatically locate, fetch, and parse all associated XML sitemaps, flattening them into a clean array of URLs.
POST /v1/intel/robots Robots.txt Analysis
Check if specific paths are allowed or disallowed for scraping, giving you peace of mind before kicking off a massive job.