API Documentation

Everything you need to integrate hintrix into your application or AI agent.

Quickstart SDK / npm Authentication Endpoints Batch Webhooks JS Rendering Rate Limits Content Policy Errors Credits Code Examples MCP Server

Quickstart

Get up and running in under a minute.

1. Create an account

Sign up at hintrix.com/register. Check your email for a verification link — you must verify before making API calls. You get 500 free credits instantly, no credit card required — earn 500 more by sharing hintrix on X.

2. Get your API key

After logging in, go to your dashboard to find your API key. It starts with hx_live_.

3. Make your first request

your first scrape

$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_your_key_here" \
  -d '{"url": "https://example.com", "mode": ["content"]}'

// Response
{
  "agent": "glance",
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 312,
  "js_rendered": false,
  "metadata": {
    "title": "Example Domain",
    "description": "This domain is for use in illustrative examples.",
    "language": "en",
    "canonical": "https://example.com",
    "og": {}
  },
  "content": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "word_count": 83
  },
  "credits_used": 1,
  "request_id": "req_a1b2c3d4"
}

SDK / npm

The official Node.js SDK wraps the REST API with typed methods, automatic retries, and polling helpers. Requires Node 18+.

Installation

install

$ npm install hintrix

Quick Start

quick-start.js

import { Hintrix } from 'hintrix';

const hx = new Hintrix('hx_live_your_api_key');

// Scrape a page
const page = await hx.scrape('https://example.com');
console.log(page.content.markdown);

// Scrape with audit
const audited = await hx.scrape('https://example.com', {
  mode: ['content', 'audit']
});
console.log(audited.audit.geo_score);

The API key can also be read from the HINTRIX_API_KEY environment variable — omit the first argument in that case.

All Methods

methods.js

// Single-page operations (synchronous response)
await hx.scrape(url, options?)          // content extraction
await hx.audit(url, options?)           // GEO audit
await hx.extract(url, schema?, options?) // structured extraction

// Async jobs — returns a job handle immediately
await hx.crawl(url, options?)            // start multi-page crawl
await hx.batch(urls, options?)           // batch URLs

// Job management
await hx.getJob(jobId)                   // check job status
await hx.getJobPages(jobId)              // get paginated results
await hx.waitForJob(jobId, options?)     // poll until done

// High-level helpers (start + wait + collect in one call)
await hx.crawlAndCollect(url, options?)  // crawl + collect all pages
await hx.batchAndCollect(urls, options?) // batch + collect all results
await hx.scrapeMany(urls, options?)      // parallel scrapes (concurrency cap)

Crawl with Polling

crawl-collect.js

const result = await hx.crawlAndCollect('https://example.com', {
  max_pages: 50,
  mode: ['content', 'audit'],
  onProgress: (status) =>
    console.log(`${status.pages_crawled} pages...`),
});
console.log(`Done: ${result.pages.length} pages`);

Error Handling

errors.js

import { Hintrix, AuthenticationError, RateLimitError } from 'hintrix';

try {
  await hx.scrape(url);
} catch (err) {
  if (err instanceof RateLimitError) {
    console.log(`Rate limited, retry after ${err.retryAfter}s`);
  } else if (err instanceof AuthenticationError) {
    console.log('Invalid API key');
  }
}

Other exported error classes: APIError, ConnectionError, TimeoutError, ValidationError. All extend APIError.

Configuration

config.js

const hx = new Hintrix('hx_live_...', {
  baseUrl:    'https://hintrix.com', // default
  timeout:    30000,                  // 30s default
  maxRetries: 3,                      // auto-retry on 429 / 5xx
});

Authentication

All API requests require an API key passed via the X-API-Key header.

Key format

API keys follow this pattern: hx_live_sk_...

Keep your API key secret. Do not expose it in client-side code or public repositories. If compromised, regenerate it from your dashboard.

authentication header

X-API-Key: hx_live_sk_a1b2c3d4e5f6...

Endpoints

Base URL: https://hintrix.com — five endpoints: /v1/scrape, /v1/audit, /v1/extract, /v1/crawl, /v1/batch.

POST /v1/scrape agent: glance (content) | reveal (audit)

Scrape a single URL. Returns clean Markdown content, and optionally a GEO audit with scores and issues.

Parameters

Parameter	Type	Required	Description
url	string	Yes	The URL to scrape
mode	array	No	`["content"]`, `["audit"]`, or `["content","audit"]`. Default: `["content"]`
output_format	string	No	`markdown`, `html`, or `text`. Default: `markdown`
wait_for_js	boolean	No	JS rendering is enabled by default. Set to `false` for faster plain HTTP scraping. No extra cost.
include_links	boolean	No	Include extracted links in response. Default: `true`
include_schema	boolean	No	Include Schema.org / JSON-LD data. Default: `true`
respect_robots_txt	boolean	No	Respect robots.txt directives. Default: `true`
include_screenshot	boolean	No	Returns a base64-encoded PNG screenshot of the page. Default: `false`. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit.
include_diff	boolean	No	Compare with the previous scrape of this URL and return what changed (additions, deletions, changed lines). Previous content is stored per user per URL for 7 days, then automatically deleted. The first scrape of a URL returns `null` (no previous version to compare). Subsequent scrapes return a diff object. No extra credit cost.

POST /v1/scrape — content + audit

$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://example.com",
    "mode": ["content", "audit"],
    "include_links": true,
    "include_schema": true
  }'

// Response
{
  "agent": "reveal",
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 418,
  "js_rendered": false,
  "metadata": {
    "title": "About Us — Example",
    "description": "We build tools for developers.",
    "language": "en",
    "canonical": "https://example.com/about",
    "og": {}
  },
  "content": {
    "markdown": "# About Us\n\nWe build tools for developers...",
    "word_count": 890
  },
  "links": [
    { "href": "https://example.com/contact", "text": "Contact", "type": "internal", "nofollow": false },
    { "href": "https://example.com/blog", "text": "Blog", "type": "internal", "nofollow": false }
  ],
  "schema_data": [{ "@type": "Organization", "name": "Example" }],
  "audit": {
    "geo_score": 72,
    "tech_score": 85,
    "issues": [
      {
        "title": "PerplexityBot blocked in robots.txt",
        "severity": "critical",
        "category": "ai_bot_access",
        "fix": "Remove User-agent: PerplexityBot / Disallow: /"
      }
    ],
    "pagespeed": { "performance": 91, "accessibility": 87 },
    "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." }
  },
  "credits_used": 2,
  "request_id": "req_b2c3d4e5"
}

POST /v1/audit agent: reveal

Run a GEO readiness audit on a single URL. Uses plain HTTP (no JS rendering). Costs 2 credits. For JS-rendered audit results, use /v1/scrape with mode: ['content', 'audit'] instead (costs 2 credits with JS rendering included).

Parameters

Parameter	Type	Required	Description
url	string	Yes	The URL to audit
wait_for_js	boolean	No	JS rendering is enabled by default. Set to `false` for faster plain HTTP scraping. No extra cost.
include_links	boolean	No	Include extracted links in response. Default: `true`
include_schema	boolean	No	Include Schema.org / JSON-LD data. Default: `true`
respect_robots_txt	boolean	No	Respect robots.txt directives. Default: `true`

POST /v1/audit

$ curl -X POST https://hintrix.com/v1/audit \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{"url": "https://example.com"}'

// Response
{
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 389,
  "geo_score": 72,
  "tech_score": 85,
  "issues": [
    {
      "title": "Missing Schema.org Organization markup",
      "severity": "high",
      "category": "structured_data",
      "fix": "Add JSON-LD Organization schema to <head>"
    },
    {
      "title": "ChatGPT-User bot blocked",
      "severity": "critical",
      "category": "ai_bot_access",
      "fix": "Remove 'User-agent: ChatGPT-User' block from robots.txt"
    },
    {
      "title": "No author attribution on content",
      "severity": "medium",
      "category": "eeat",
      "fix": "Add visible author name and link to author page"
    }
  ],
  "pagespeed": { "performance": 91, "accessibility": 87 },
  "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." },
  "credits_used": 2
}

POST /v1/extract agent: pinch

Extract structured data from any page. Define a schema with CSS selectors or let auto-detection handle it. Works with SPAs and JSON endpoints.

Parameters

Parameter	Type	Required	Description
url	string	Yes	The URL to extract from
schema	object	No	Field-to-CSS-selector mapping. Omit for auto-detection.
wait_for_js	boolean	No	JS rendering is enabled by default. Set to `false` for faster plain HTTP scraping. No extra cost.
respect_robots_txt	boolean	No	Respect robots.txt directives. Default: `true`

POST /v1/extract

$ curl -X POST https://hintrix.com/v1/extract \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://shop.example.com/product/123",
    "schema": {
      "name": "h1.product-title",
      "price": ".price-current",
      "description": ".product-description",
      "in_stock": ".availability"
    },
    "wait_for_js": true
  }'

// Response
{
  "agent": "pinch",
  "url": "https://shop.example.com/product/123",
  "status_code": 200,
  "response_time_ms": 1204,
  "js_rendered": true,
  "data": {
    "name": "Wireless Headphones Pro",
    "price": "$149.99",
    "description": "Premium noise-cancelling headphones with 30h battery...",
    "in_stock": "In Stock"
  },
  "credits_used": 2
}

POST /v1/crawl agent: sweep

Start an asynchronous multi-page crawl. Returns a job ID for polling progress and retrieving results.

Parameters

Parameter	Type	Required	Description
url	string	Yes	Starting URL for the crawl
max_pages	integer	No	Maximum pages to crawl (1–5000). Default: `10`
max_depth	integer	No	Maximum link depth from starting URL (1–10). Default: `2`
mode	array	No	`["content"]`, `["audit"]`, or `["content","audit"]`. Default: `["content"]`
output_format	string	No	`markdown`, `html`, or `text`. Default: `markdown`
wait_for_js	boolean	No	JS rendering is enabled by default. Set to `false` for faster plain HTTP scraping. No extra cost.
include_links	boolean	No	Include extracted links per page. Default: `true`
include_schema	boolean	No	Include Schema.org / JSON-LD data per page. Default: `true`
respect_robots_txt	boolean	No	Respect robots.txt directives. Default: `true`
check_links	boolean	No	Check all discovered links for broken URLs (404s, redirects). Results appear in `link_health` on job completion. Default: `false`
webhook_url	string	No	URL to receive a POST request when the job completes. Optional.

POST /v1/crawl — start a crawl job

$ curl -X POST https://hintrix.com/v1/crawl \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://docs.example.com",
    "max_pages": 100,
    "max_depth": 3,
    "mode": ["content"]
  }'

// Response (HTTP 201)
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "queued",
  "url": "https://docs.example.com",
  "mode": ["content"],
  "max_pages": 100,
  "pages_crawled": 0,
  "credits_used": 100,
  "created_at": "2026-04-02T10:00:00+00:00"
}

GET /v1/crawl/{job_id}

Check the status and progress of a crawl job. Status values: queued, running, completed, failed.

GET /v1/crawl/{job_id} — check status

$ curl https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6 \
  -H "X-API-Key: hx_live_sk_..."

// Response
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "completed",
  "url": "https://docs.example.com",
  "pages_crawled": 47,
  "pages_total": 47,
  "credits_used": 47,
  "error": null,
  "created_at": "2026-04-02T10:00:00+00:00",
  "started_at": "2026-04-02T10:00:02+00:00",
  "completed_at": "2026-04-02T10:02:14+00:00"
}

When check_links: true was requested, a link_health object is included in the completed response showing broken and redirected URLs found during the crawl.

GET /v1/crawl/{job_id}/pages

Retrieve paginated results from a completed crawl job. Query params: page (default: 1), page_size (default: 20).

GET /v1/crawl/{job_id}/pages — get results

$ curl "https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6/pages?page=1&page_size=20" \
  -H "X-API-Key: hx_live_sk_..."

// Response
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "total": 47,
  "page": 1,
  "page_size": 20,
  "items": [
    {
      "url": "https://docs.example.com",
      "status_code": 200,
      "response_time_ms": 310,
      "word_count": 340,
      "content_markdown": "# Documentation\n\nWelcome to the docs...",
      "metadata": { "title": "Documentation", "description": "..." },
      "links": [{ "href": "https://docs.example.com/getting-started", "text": "Getting Started", "type": "internal", "nofollow": false }],
      "schema_data": null,
      "audit_result": null
    },
    {
      "url": "https://docs.example.com/getting-started",
      "status_code": 200,
      "response_time_ms": 284,
      "word_count": 520,
      "content_markdown": "# Getting Started\n\nFollow these steps...",
      "metadata": { "title": "Getting Started", "description": "..." },
      "links": [],
      "schema_data": null,
      "audit_result": null
    }
  ]
}

Batch Endpoint

Scrape up to 100 independent URLs in a single asynchronous job. Unlike /v1/crawl, batch does not follow links — each URL in the list is scraped independently.

POST /v1/batch agent: sweep

Submit a list of URLs for parallel scraping. Returns a job_id immediately. Poll GET /v1/crawl/{job_id} for status and GET /v1/crawl/{job_id}/pages for per-URL results.

Parameters

Parameter	Type	Required	Description
urls	array	Yes	List of URLs to scrape. Max 100.
mode	array	No	`["content"]`, `["audit"]`, or `["content","audit"]`. Default: `["content"]`
output_format	string	No	`markdown`, `html`, or `text`. Default: `markdown`
wait_for_js	boolean	No	JS rendering is enabled by default. Set to `false` for faster plain HTTP scraping. No extra cost.
include_screenshot	boolean	No	Capture a base64 PNG screenshot per URL. Default: `false`. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit per URL.
respect_robots_txt	boolean	No	Respect robots.txt directives. Default: `true`
webhook_url	string	No	URL to receive a POST request when the batch job completes. Optional.

POST /v1/batch — scrape multiple URLs

$ curl -X POST https://hintrix.com/v1/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/pricing"
    ],
    "mode": ["content"],
    "webhook_url": "https://yourapp.com/webhooks/hintrix"
  }'

// Response (HTTP 201)
{
  "job_id": "7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d",
  "status": "queued",
  "url": "https://example.com",
  "mode": ["content"],
  "max_pages": 3,
  "pages_crawled": 0,
  "credits_used": 3,
  "created_at": "2026-04-02T10:05:00+00:00"
}

// Poll for status
$ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d \
  -H "X-API-Key: hx_live_sk_..."

// Retrieve results when completed
$ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d/pages \
  -H "X-API-Key: hx_live_sk_..."

Webhooks

Both /v1/crawl and /v1/batch support an optional webhook_url parameter. When a job completes (successfully or with an error), hintrix sends a POST request to your webhook URL with a JSON payload containing the final job status.

Webhook payload

webhook payload example

{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "completed",
  "url": "https://docs.example.com",
  "pages_crawled": 47,
  "pages_total": 47,
  "credits_used": 47,
  "completed_at": "2026-04-02T10:02:14+00:00"
}

Respond to the webhook with any 2xx status within 10 seconds. Webhooks are not retried on failure. Use GET /v1/crawl/{job_id} to poll for status if you miss a delivery.

JavaScript Rendering

Some pages rely on JavaScript to render their content. SPAs built with React, Vue, Next.js, or Angular often return empty HTML to traditional crawlers.

Enabled by default

hintrix uses full browser rendering by default for reliable content extraction. JS rendering is included at no extra cost — it does not affect the credit price.

Opting out for plain HTTP

Set wait_for_js: false if you want faster plain HTTP scraping. This may return incomplete content for JS-heavy pages, but is faster and still costs the same:

plain HTTP scraping (opt-out of JS)

$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{"url": "https://example.com", "wait_for_js": false}'

// costs 1 credit — same as with JS rendering

When you need it

Single-page applications (React, Vue, Angular)
Pages that load content via AJAX/fetch after initial render
Dynamic dashboards and data tables
Sites behind JavaScript-based anti-bot measures

Rate Limits

Rate limits are applied per API key and per domain to ensure fair usage.

Limit	Value
General requests	100 requests per minute per key
Crawl jobs	10 crawl starts per minute per key
Per domain (global)	10 requests per minute per domain
Per domain (per user)	100 requests per domain per day per user

When you hit a rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating when you can retry.

Content Policy

Certain domains cannot be scraped through hintrix.

Blocked platforms

Facebook
Instagram
Twitter/X
LinkedIn
TikTok
YouTube
Reddit
Pinterest
Snapchat
Threads

Blocked TLDs

Certain country-level and restricted TLDs are blocked based on legal jurisdiction. Requests to blocked domains or TLDs return HTTP 403.

Our User-Agent is HintrixBot/1.0 (+https://hintrix.com/bot). Website owners can control access via robots.txt.

Error Codes

All errors return a JSON body with an error field and a human-readable message.

Status	Meaning	Common Cause
400	Bad Request	Missing or invalid parameters (e.g., invalid URL)
401	Unauthorized	Missing or invalid API key
402	Payment Required	Insufficient credits for this request
403	Forbidden	Domain blocked by content policy or robots.txt
404	Not Found	Endpoint or crawl job not found
429	Too Many Requests	Rate limit exceeded, check Retry-After header
500	Internal Server Error	Something went wrong on our end

error response example

{
  "error": "insufficient_credits",
  "message": "This request requires 2 credits but you only have 1 remaining.",
  "request_id": "req_c3d4e5f6"
}

Credits

Every API call costs credits. Credits never expire and can be topped up anytime. New accounts receive 500 free credits on signup — plus 500 bonus credits for sharing hintrix on X/Twitter.

Action	Credits
/v1/scrape (content only)	1
/v1/scrape (audit only)	1
/v1/scrape (content + audit)	2
/v1/audit	2
/v1/extract	2
/v1/crawl — content mode	1 per page
/v1/crawl — content + audit mode	2 per page
/v1/batch	same as /v1/scrape per URL
JS rendering	included (no extra cost)
Screenshot add-on	+1 per request/URL

JS rendering is included at no extra cost. For example: scrape = 1 credit. Scrape with audit = 2 credits. Crawl with content+audit = 2 credits per page. All of the above with full JS rendering costs the same. Credits for crawl jobs are pre-deducted and unused credits are refunded on completion.

Code Examples

Examples for the /v1/scrape endpoint with audit mode in multiple languages.

curl
$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_your_key_here" \
  -d '{
    "url": "https://example.com",
    "mode": ["content", "audit"],
    "include_links": true
  }'

python
import requests

response = requests.post(
    "https://hintrix.com/v1/scrape",
    headers={
        "Content-Type": "application/json",
        "X-API-Key": "hx_live_sk_your_key_here",
    },
    json={
        "url": "https://example.com",
        "mode": ["content", "audit"],
        "include_links": True,
    },
)

data = response.json()
print(data["content"]["markdown"])
print(f"GEO Score: {data["audit"]['geo_score']}")

for issue in data["audit"]["issues"]:
    print(f"  [{issue['severity']}] {issue['title']}")
    print(f"  Fix: {issue['fix']}")

javascript
const response = await fetch("https://hintrix.com/v1/scrape", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": "hx_live_sk_your_key_here",
  },
  body: JSON.stringify({
    url: "https://example.com",
    mode: ["content", "audit"],
    include_links: true,
  }),
});

const data = await response.json();
console.log(data.content.markdown);
console.log(`GEO Score: ${data.audit.geo_score}`);

data.audit.issues.forEach((issue) => {
  console.log(`  [${issue.severity}] ${issue.title}`);
  console.log(`  Fix: ${issue.fix}`);
});

MCP Server

hintrix provides a Model Context Protocol (MCP) server for native integration with AI coding tools like Claude Code, Cursor, and Windsurf.

Claude Code

Add to your ~/.claude/settings.json:

~/.claude/settings.json

{
  "mcpServers": {
    "hintrix": {
      "type": "sse",
      "url": "https://hintrix.com/mcp/sse",
      "headers": {
        "X-API-Key": "hx_live_sk_your_key_here"
      }
    }
  }
}

Cursor

Add to your .cursor/mcp.json in your project root:

.cursor/mcp.json

{
  "mcpServers": {
    "hintrix": {
      "type": "sse",
      "url": "https://hintrix.com/mcp/sse",
      "headers": {
        "X-API-Key": "hx_live_sk_your_key_here"
      }
    }
  }
}

Once configured, your AI assistant can use hintrix tools directly: scrape URLs for context, audit pages, extract data, and crawl domains — all within your coding workflow.

API Documentation

Quickstart

1. Create an account

2. Get your API key

3. Make your first request

SDK / npm

Installation

Quick Start

All Methods

Crawl with Polling

Error Handling

Configuration

Authentication

Key format

Endpoints

Parameters

Parameters

Parameters

Parameters

Batch Endpoint

Parameters

Webhooks

Webhook payload

JavaScript Rendering

Enabled by default

Opting out for plain HTTP

When you need it

Rate Limits

Content Policy

Blocked platforms

Blocked TLDs

Error Codes

Credits

Code Examples

MCP Server

Claude Code

Cursor

Get in touch