API Documentation

Everything you need to integrate hintrix into your application or AI agent.

Quickstart SDK / npm Authentication Endpoints Batch Webhooks JS Rendering Rate Limits Content Policy Errors Credits Code Examples MCP Server

Quickstart

Get up and running in under a minute.

1. Create an account

Sign up at hintrix.com/register. Check your email for a verification link — you must verify before making API calls. You get 500 free credits instantly, no credit card required — earn 500 more by sharing hintrix on X.

2. Get your API key

After logging in, go to your dashboard to find your API key. It starts with hx_live_.

3. Make your first request

your first scrape
$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_your_key_here" \
  -d '{"url": "https://example.com", "mode": ["content"]}'

// Response
{
  "agent": "glance",
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 312,
  "js_rendered": false,
  "metadata": {
    "title": "Example Domain",
    "description": "This domain is for use in illustrative examples.",
    "language": "en",
    "canonical": "https://example.com",
    "og": {}
  },
  "content": {
    "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...",
    "word_count": 83
  },
  "credits_used": 1,
  "request_id": "req_a1b2c3d4"
}

SDK / npm

The official Node.js SDK wraps the REST API with typed methods, automatic retries, and polling helpers. Requires Node 18+.

Installation

install
$ npm install hintrix

Quick Start

quick-start.js
import { Hintrix } from 'hintrix';

const hx = new Hintrix('hx_live_your_api_key');

// Scrape a page
const page = await hx.scrape('https://example.com');
console.log(page.content.markdown);

// Scrape with audit
const audited = await hx.scrape('https://example.com', {
  mode: ['content', 'audit']
});
console.log(audited.audit.geo_score);

The API key can also be read from the HINTRIX_API_KEY environment variable — omit the first argument in that case.

All Methods

methods.js
// Single-page operations (synchronous response)
await hx.scrape(url, options?)          // content extraction
await hx.audit(url, options?)           // GEO audit
await hx.extract(url, schema?, options?) // structured extraction

// Async jobs — returns a job handle immediately
await hx.crawl(url, options?)            // start multi-page crawl
await hx.batch(urls, options?)           // batch URLs

// Job management
await hx.getJob(jobId)                   // check job status
await hx.getJobPages(jobId)              // get paginated results
await hx.waitForJob(jobId, options?)     // poll until done

// High-level helpers (start + wait + collect in one call)
await hx.crawlAndCollect(url, options?)  // crawl + collect all pages
await hx.batchAndCollect(urls, options?) // batch + collect all results
await hx.scrapeMany(urls, options?)      // parallel scrapes (concurrency cap)

Crawl with Polling

crawl-collect.js
const result = await hx.crawlAndCollect('https://example.com', {
  max_pages: 50,
  mode: ['content', 'audit'],
  onProgress: (status) =>
    console.log(`${status.pages_crawled} pages...`),
});
console.log(`Done: ${result.pages.length} pages`);

Error Handling

errors.js
import { Hintrix, AuthenticationError, RateLimitError } from 'hintrix';

try {
  await hx.scrape(url);
} catch (err) {
  if (err instanceof RateLimitError) {
    console.log(`Rate limited, retry after ${err.retryAfter}s`);
  } else if (err instanceof AuthenticationError) {
    console.log('Invalid API key');
  }
}

Other exported error classes: APIError, ConnectionError, TimeoutError, ValidationError. All extend APIError.

Configuration

config.js
const hx = new Hintrix('hx_live_...', {
  baseUrl:    'https://hintrix.com', // default
  timeout:    30000,                  // 30s default
  maxRetries: 3,                      // auto-retry on 429 / 5xx
});

Authentication

All API requests require an API key passed via the X-API-Key header.

Key format

API keys follow this pattern: hx_live_sk_...

Keep your API key secret. Do not expose it in client-side code or public repositories. If compromised, regenerate it from your dashboard.

authentication header
X-API-Key: hx_live_sk_a1b2c3d4e5f6...

Endpoints

Base URL: https://hintrix.com — five endpoints: /v1/scrape, /v1/audit, /v1/extract, /v1/crawl, /v1/batch.

POST /v1/scrape agent: glance (content) | reveal (audit)

Scrape a single URL. Returns clean Markdown content, and optionally a GEO audit with scores and issues.

Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL to scrape
modearrayNo["content"], ["audit"], or ["content","audit"]. Default: ["content"]
output_formatstringNomarkdown, html, or text. Default: markdown
wait_for_jsbooleanNoJS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost.
include_linksbooleanNoInclude extracted links in response. Default: true
include_schemabooleanNoInclude Schema.org / JSON-LD data. Default: true
respect_robots_txtbooleanNoRespect robots.txt directives. Default: true
include_screenshotbooleanNoReturns a base64-encoded PNG screenshot of the page. Default: false. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit.
include_diffbooleanNoCompare with the previous scrape of this URL and return what changed (additions, deletions, changed lines). Previous content is stored per user per URL for 7 days, then automatically deleted. The first scrape of a URL returns null (no previous version to compare). Subsequent scrapes return a diff object. No extra credit cost.
POST /v1/scrape — content + audit
$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://example.com",
    "mode": ["content", "audit"],
    "include_links": true,
    "include_schema": true
  }'

// Response
{
  "agent": "reveal",
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 418,
  "js_rendered": false,
  "metadata": {
    "title": "About Us — Example",
    "description": "We build tools for developers.",
    "language": "en",
    "canonical": "https://example.com/about",
    "og": {}
  },
  "content": {
    "markdown": "# About Us\n\nWe build tools for developers...",
    "word_count": 890
  },
  "links": [
    { "href": "https://example.com/contact", "text": "Contact", "type": "internal", "nofollow": false },
    { "href": "https://example.com/blog", "text": "Blog", "type": "internal", "nofollow": false }
  ],
  "schema_data": [{ "@type": "Organization", "name": "Example" }],
  "audit": {
    "geo_score": 72,
    "tech_score": 85,
    "issues": [
      {
        "title": "PerplexityBot blocked in robots.txt",
        "severity": "critical",
        "category": "ai_bot_access",
        "fix": "Remove User-agent: PerplexityBot / Disallow: /"
      }
    ],
    "pagespeed": { "performance": 91, "accessibility": 87 },
    "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." }
  },
  "credits_used": 2,
  "request_id": "req_b2c3d4e5"
}
POST /v1/audit agent: reveal

Run a GEO readiness audit on a single URL. Uses plain HTTP (no JS rendering). Costs 2 credits. For JS-rendered audit results, use /v1/scrape with mode: ['content', 'audit'] instead (costs 2 credits with JS rendering included).

Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL to audit
wait_for_jsbooleanNoJS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost.
include_linksbooleanNoInclude extracted links in response. Default: true
include_schemabooleanNoInclude Schema.org / JSON-LD data. Default: true
respect_robots_txtbooleanNoRespect robots.txt directives. Default: true
POST /v1/audit
$ curl -X POST https://hintrix.com/v1/audit \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{"url": "https://example.com"}'

// Response
{
  "url": "https://example.com",
  "status_code": 200,
  "response_time_ms": 389,
  "geo_score": 72,
  "tech_score": 85,
  "issues": [
    {
      "title": "Missing Schema.org Organization markup",
      "severity": "high",
      "category": "structured_data",
      "fix": "Add JSON-LD Organization schema to <head>"
    },
    {
      "title": "ChatGPT-User bot blocked",
      "severity": "critical",
      "category": "ai_bot_access",
      "fix": "Remove 'User-agent: ChatGPT-User' block from robots.txt"
    },
    {
      "title": "No author attribution on content",
      "severity": "medium",
      "category": "eeat",
      "fix": "Add visible author name and link to author page"
    }
  ],
  "pagespeed": { "performance": 91, "accessibility": 87 },
  "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." },
  "credits_used": 2
}
POST /v1/extract agent: pinch

Extract structured data from any page. Define a schema with CSS selectors or let auto-detection handle it. Works with SPAs and JSON endpoints.

Parameters

ParameterTypeRequiredDescription
urlstringYesThe URL to extract from
schemaobjectNoField-to-CSS-selector mapping. Omit for auto-detection.
wait_for_jsbooleanNoJS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost.
respect_robots_txtbooleanNoRespect robots.txt directives. Default: true
POST /v1/extract
$ curl -X POST https://hintrix.com/v1/extract \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://shop.example.com/product/123",
    "schema": {
      "name": "h1.product-title",
      "price": ".price-current",
      "description": ".product-description",
      "in_stock": ".availability"
    },
    "wait_for_js": true
  }'

// Response
{
  "agent": "pinch",
  "url": "https://shop.example.com/product/123",
  "status_code": 200,
  "response_time_ms": 1204,
  "js_rendered": true,
  "data": {
    "name": "Wireless Headphones Pro",
    "price": "$149.99",
    "description": "Premium noise-cancelling headphones with 30h battery...",
    "in_stock": "In Stock"
  },
  "credits_used": 2
}
POST /v1/crawl agent: sweep

Start an asynchronous multi-page crawl. Returns a job ID for polling progress and retrieving results.

Parameters

ParameterTypeRequiredDescription
urlstringYesStarting URL for the crawl
max_pagesintegerNoMaximum pages to crawl (1–5000). Default: 10
max_depthintegerNoMaximum link depth from starting URL (1–10). Default: 2
modearrayNo["content"], ["audit"], or ["content","audit"]. Default: ["content"]
output_formatstringNomarkdown, html, or text. Default: markdown
wait_for_jsbooleanNoJS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost.
include_linksbooleanNoInclude extracted links per page. Default: true
include_schemabooleanNoInclude Schema.org / JSON-LD data per page. Default: true
respect_robots_txtbooleanNoRespect robots.txt directives. Default: true
check_linksbooleanNoCheck all discovered links for broken URLs (404s, redirects). Results appear in link_health on job completion. Default: false
webhook_urlstringNoURL to receive a POST request when the job completes. Optional.
POST /v1/crawl — start a crawl job
$ curl -X POST https://hintrix.com/v1/crawl \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "url": "https://docs.example.com",
    "max_pages": 100,
    "max_depth": 3,
    "mode": ["content"]
  }'

// Response (HTTP 201)
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "queued",
  "url": "https://docs.example.com",
  "mode": ["content"],
  "max_pages": 100,
  "pages_crawled": 0,
  "credits_used": 100,
  "created_at": "2026-04-02T10:00:00+00:00"
}
GET /v1/crawl/{job_id}

Check the status and progress of a crawl job. Status values: queued, running, completed, failed.

GET /v1/crawl/{job_id} — check status
$ curl https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6 \
  -H "X-API-Key: hx_live_sk_..."

// Response
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "completed",
  "url": "https://docs.example.com",
  "pages_crawled": 47,
  "pages_total": 47,
  "credits_used": 47,
  "error": null,
  "created_at": "2026-04-02T10:00:00+00:00",
  "started_at": "2026-04-02T10:00:02+00:00",
  "completed_at": "2026-04-02T10:02:14+00:00"
}

When check_links: true was requested, a link_health object is included in the completed response showing broken and redirected URLs found during the crawl.

GET /v1/crawl/{job_id}/pages

Retrieve paginated results from a completed crawl job. Query params: page (default: 1), page_size (default: 20).

GET /v1/crawl/{job_id}/pages — get results
$ curl "https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6/pages?page=1&page_size=20" \
  -H "X-API-Key: hx_live_sk_..."

// Response
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "total": 47,
  "page": 1,
  "page_size": 20,
  "items": [
    {
      "url": "https://docs.example.com",
      "status_code": 200,
      "response_time_ms": 310,
      "word_count": 340,
      "content_markdown": "# Documentation\n\nWelcome to the docs...",
      "metadata": { "title": "Documentation", "description": "..." },
      "links": [{ "href": "https://docs.example.com/getting-started", "text": "Getting Started", "type": "internal", "nofollow": false }],
      "schema_data": null,
      "audit_result": null
    },
    {
      "url": "https://docs.example.com/getting-started",
      "status_code": 200,
      "response_time_ms": 284,
      "word_count": 520,
      "content_markdown": "# Getting Started\n\nFollow these steps...",
      "metadata": { "title": "Getting Started", "description": "..." },
      "links": [],
      "schema_data": null,
      "audit_result": null
    }
  ]
}

Batch Endpoint

Scrape up to 100 independent URLs in a single asynchronous job. Unlike /v1/crawl, batch does not follow links — each URL in the list is scraped independently.

POST /v1/batch agent: sweep

Submit a list of URLs for parallel scraping. Returns a job_id immediately. Poll GET /v1/crawl/{job_id} for status and GET /v1/crawl/{job_id}/pages for per-URL results.

Parameters

ParameterTypeRequiredDescription
urlsarrayYesList of URLs to scrape. Max 100.
modearrayNo["content"], ["audit"], or ["content","audit"]. Default: ["content"]
output_formatstringNomarkdown, html, or text. Default: markdown
wait_for_jsbooleanNoJS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost.
include_screenshotbooleanNoCapture a base64 PNG screenshot per URL. Default: false. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit per URL.
respect_robots_txtbooleanNoRespect robots.txt directives. Default: true
webhook_urlstringNoURL to receive a POST request when the batch job completes. Optional.
POST /v1/batch — scrape multiple URLs
$ curl -X POST https://hintrix.com/v1/batch \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{
    "urls": [
      "https://example.com",
      "https://example.com/about",
      "https://example.com/pricing"
    ],
    "mode": ["content"],
    "webhook_url": "https://yourapp.com/webhooks/hintrix"
  }'

// Response (HTTP 201)
{
  "job_id": "7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d",
  "status": "queued",
  "url": "https://example.com",
  "mode": ["content"],
  "max_pages": 3,
  "pages_crawled": 0,
  "credits_used": 3,
  "created_at": "2026-04-02T10:05:00+00:00"
}

// Poll for status
$ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d \
  -H "X-API-Key: hx_live_sk_..."

// Retrieve results when completed
$ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d/pages \
  -H "X-API-Key: hx_live_sk_..."

Webhooks

Both /v1/crawl and /v1/batch support an optional webhook_url parameter. When a job completes (successfully or with an error), hintrix sends a POST request to your webhook URL with a JSON payload containing the final job status.

Webhook payload

webhook payload example
{
  "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
  "status": "completed",
  "url": "https://docs.example.com",
  "pages_crawled": 47,
  "pages_total": 47,
  "credits_used": 47,
  "completed_at": "2026-04-02T10:02:14+00:00"
}

Respond to the webhook with any 2xx status within 10 seconds. Webhooks are not retried on failure. Use GET /v1/crawl/{job_id} to poll for status if you miss a delivery.

JavaScript Rendering

Some pages rely on JavaScript to render their content. SPAs built with React, Vue, Next.js, or Angular often return empty HTML to traditional crawlers.

Enabled by default

hintrix uses full browser rendering by default for reliable content extraction. JS rendering is included at no extra cost — it does not affect the credit price.

Opting out for plain HTTP

Set wait_for_js: false if you want faster plain HTTP scraping. This may return incomplete content for JS-heavy pages, but is faster and still costs the same:

plain HTTP scraping (opt-out of JS)
$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_..." \
  -d '{"url": "https://example.com", "wait_for_js": false}'

// costs 1 credit — same as with JS rendering

When you need it

Rate Limits

Rate limits are applied per API key and per domain to ensure fair usage.

LimitValue
General requests100 requests per minute per key
Crawl jobs10 crawl starts per minute per key
Per domain (global)10 requests per minute per domain
Per domain (per user)100 requests per domain per day per user

When you hit a rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating when you can retry.

Content Policy

Certain domains cannot be scraped through hintrix.

Blocked platforms

Blocked TLDs

Certain country-level and restricted TLDs are blocked based on legal jurisdiction. Requests to blocked domains or TLDs return HTTP 403.

Our User-Agent is HintrixBot/1.0 (+https://hintrix.com/bot). Website owners can control access via robots.txt.

Error Codes

All errors return a JSON body with an error field and a human-readable message.

StatusMeaningCommon Cause
400Bad RequestMissing or invalid parameters (e.g., invalid URL)
401UnauthorizedMissing or invalid API key
402Payment RequiredInsufficient credits for this request
403ForbiddenDomain blocked by content policy or robots.txt
404Not FoundEndpoint or crawl job not found
429Too Many RequestsRate limit exceeded, check Retry-After header
500Internal Server ErrorSomething went wrong on our end
error response example
{
  "error": "insufficient_credits",
  "message": "This request requires 2 credits but you only have 1 remaining.",
  "request_id": "req_c3d4e5f6"
}

Credits

Every API call costs credits. Credits never expire and can be topped up anytime. New accounts receive 500 free credits on signup — plus 500 bonus credits for sharing hintrix on X/Twitter.

ActionCredits
/v1/scrape (content only)1
/v1/scrape (audit only)1
/v1/scrape (content + audit)2
/v1/audit2
/v1/extract2
/v1/crawl — content mode1 per page
/v1/crawl — content + audit mode2 per page
/v1/batchsame as /v1/scrape per URL
JS renderingincluded (no extra cost)
Screenshot add-on+1 per request/URL

JS rendering is included at no extra cost. For example: scrape = 1 credit. Scrape with audit = 2 credits. Crawl with content+audit = 2 credits per page. All of the above with full JS rendering costs the same. Credits for crawl jobs are pre-deducted and unused credits are refunded on completion.

Code Examples

Examples for the /v1/scrape endpoint with audit mode in multiple languages.

curl
$ curl -X POST https://hintrix.com/v1/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: hx_live_sk_your_key_here" \
  -d '{
    "url": "https://example.com",
    "mode": ["content", "audit"],
    "include_links": true
  }'
python
import requests

response = requests.post(
    "https://hintrix.com/v1/scrape",
    headers={
        "Content-Type": "application/json",
        "X-API-Key": "hx_live_sk_your_key_here",
    },
    json={
        "url": "https://example.com",
        "mode": ["content", "audit"],
        "include_links": True,
    },
)

data = response.json()
print(data["content"]["markdown"])
print(f"GEO Score: {data["audit"]['geo_score']}")

for issue in data["audit"]["issues"]:
    print(f"  [{issue['severity']}] {issue['title']}")
    print(f"  Fix: {issue['fix']}")
javascript
const response = await fetch("https://hintrix.com/v1/scrape", {
  method: "POST",
  headers: {
    "Content-Type": "application/json",
    "X-API-Key": "hx_live_sk_your_key_here",
  },
  body: JSON.stringify({
    url: "https://example.com",
    mode: ["content", "audit"],
    include_links: true,
  }),
});

const data = await response.json();
console.log(data.content.markdown);
console.log(`GEO Score: ${data.audit.geo_score}`);

data.audit.issues.forEach((issue) => {
  console.log(`  [${issue.severity}] ${issue.title}`);
  console.log(`  Fix: ${issue.fix}`);
});

MCP Server

hintrix provides a Model Context Protocol (MCP) server for native integration with AI coding tools like Claude Code, Cursor, and Windsurf.

Claude Code

Add to your ~/.claude/settings.json:

~/.claude/settings.json
{
  "mcpServers": {
    "hintrix": {
      "type": "sse",
      "url": "https://hintrix.com/mcp/sse",
      "headers": {
        "X-API-Key": "hx_live_sk_your_key_here"
      }
    }
  }
}

Cursor

Add to your .cursor/mcp.json in your project root:

.cursor/mcp.json
{
  "mcpServers": {
    "hintrix": {
      "type": "sse",
      "url": "https://hintrix.com/mcp/sse",
      "headers": {
        "X-API-Key": "hx_live_sk_your_key_here"
      }
    }
  }
}

Once configured, your AI assistant can use hintrix tools directly: scrape URLs for context, audit pages, extract data, and crawl domains — all within your coding workflow.