The DataSonar API

We built DataSonar because extracting reliable data from the modern web is unnecessarily complicated. Instead of managing fleets of headless browsers, rotating proxies, and writing brittle Puppeteer scripts to solve CAPTCHAs, you can use our API as your single source of truth for web intelligence.

Core Scraping Engine

Our core engine is built on top of a custom Rust-based headless browser. It handles JavaScript rendering, bot mitigation, and proxy rotation completely transparently.

The Universal Scrape Endpoint

POST /v1/scrape

This is the workhorse of the platform. Send us a URL, and we'll return the fully rendered HTML or Markdown. Need to click a button, type into a search box, or accept a cookie banner before extracting? Use our Native Actions Pipeline.

The Native Actions Pipeline

Instead of writing custom JavaScript to manipulate the DOM, you can pass an array of human-readable actions. These are compiled directly into our underlying browser engine, executing natively without the overhead of the CDP protocol.

Request Payload Example
{
  "url": "https://example.com/login",
  "format": "html",
  "stealth": true,
  "actions": [
    { "action": "type", "selector": "input[name='email']", "text": "agent@company.com" },
    { "action": "type", "selector": "input[name='password']", "text": "secure123" },
    { "action": "click", "selector": "button[type='submit']" },
    { "action": "scroll_to_bottom" }
  ],
  "js_eval": "document.title"
}
          

Scale seamlessly with Batch & Async

  • POST /v1/scrape/batch

    Pass an array of up to 100 URLs. We will scrape them concurrently using our worker pool and return a unified result set.

  • POST /v1/scrape/async

    For massive jobs, queue the work. We'll return a Job ID immediately and ping your webhook when the extraction finishes.

  • POST /v1/scrape/smart

    Not sure if a page needs full browser rendering? Smart scrape evaluates the target. If it's a static site, we bypass the browser and fetch it instantly over raw HTTP, saving you time and bandwidth.

Data Extraction & Enrichment

Getting the HTML is only half the battle. Our extraction endpoints clean the noise and give you structured, ready-to-use data.

POST /v1/extract/clean

Readability Engine

Strips out cookie banners, navigation menus, sidebars, and ads. Returns pure, clean article text perfect for feeding into LLMs.

POST /v1/extract/structured

Structured Data Parser

Automatically finds and parses hidden JSON-LD, Microdata, RDFa, Twitter Cards, and OpenGraph tags embedded in the page.

POST /v1/intel/page

Deep Page Intelligence

A 4-in-1 endpoint that returns the website's entire technology stack (React, Next.js, Stripe, etc.), high-resolution logos, social media profiles, and RSS feeds.

POST /v1/pdf

High-Fidelity Snapshots

Generates a pristine PDF document of any webpage exactly as a human would see it, fully rendered.

Network & Infrastructure Intelligence

Look beneath the surface. Our intelligence endpoints query DNS records, SMTP servers, and certificate authorities directly to profile domain infrastructure.

POST /v1/dns/intelligence Domain Intelligence

We query MX, TXT, A, and CNAME records to automatically categorize the domain's email provider (e.g. Google Workspace, Outlook) and hosting infrastructure (e.g. AWS, Vercel).

POST /v1/intel/ssl SSL & Security Profiling

Pulls the complete certificate chain, validating issuers, expiration dates, and all Subject Alternative Names (SANs) secured by the domain.

POST /v1/verify/email Deep Email Validation

We don't just check syntax. We perform a live SMTP handshake with the recipient's mail server to verify the mailbox exists, checking against disposable email providers and catch-all policies.

Crawling & Discovery

When you need to map out an entire domain, our discovery endpoints run highly parallelized routines to find everything.

Enterprise Site Crawler

POST /v1/crawl

Powered by our blazing fast Rust engine, the crawler explores a domain based on your parameters. It respects (or ignores) robots.txt, follows sitemaps, and discovers all internal pages.

POST /v1/intel/sitemap

Instant Sitemap Unrolling

Provide a domain and we'll automatically locate, fetch, and parse all associated XML sitemaps, flattening them into a clean array of URLs.

POST /v1/intel/robots

Robots.txt Analysis

Check if specific paths are allowed or disallowed for scraping, giving you peace of mind before kicking off a massive job.