Everything you need to integrate hintrix into your application or AI agent.
Get up and running in under a minute.
Sign up at hintrix.com/register. Check your email for a verification link — you must verify before making API calls. You get 500 free credits instantly, no credit card required — earn 500 more by sharing hintrix on X.
After logging in, go to your dashboard to find your API key. It starts with hx_live_.
$ curl -X POST https://hintrix.com/v1/scrape \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_your_key_here" \ -d '{"url": "https://example.com", "mode": ["content"]}' // Response { "agent": "glance", "url": "https://example.com", "status_code": 200, "response_time_ms": 312, "js_rendered": false, "metadata": { "title": "Example Domain", "description": "This domain is for use in illustrative examples.", "language": "en", "canonical": "https://example.com", "og": {} }, "content": { "markdown": "# Example Domain\n\nThis domain is for use in illustrative examples...", "word_count": 83 }, "credits_used": 1, "request_id": "req_a1b2c3d4" }
The official Node.js SDK wraps the REST API with typed methods, automatic retries, and polling helpers. Requires Node 18+.
$ npm install hintrix
import { Hintrix } from 'hintrix'; const hx = new Hintrix('hx_live_your_api_key'); // Scrape a page const page = await hx.scrape('https://example.com'); console.log(page.content.markdown); // Scrape with audit const audited = await hx.scrape('https://example.com', { mode: ['content', 'audit'] }); console.log(audited.audit.geo_score);
The API key can also be read from the HINTRIX_API_KEY environment variable — omit the first argument in that case.
// Single-page operations (synchronous response) await hx.scrape(url, options?) // content extraction await hx.audit(url, options?) // GEO audit await hx.extract(url, schema?, options?) // structured extraction // Async jobs — returns a job handle immediately await hx.crawl(url, options?) // start multi-page crawl await hx.batch(urls, options?) // batch URLs // Job management await hx.getJob(jobId) // check job status await hx.getJobPages(jobId) // get paginated results await hx.waitForJob(jobId, options?) // poll until done // High-level helpers (start + wait + collect in one call) await hx.crawlAndCollect(url, options?) // crawl + collect all pages await hx.batchAndCollect(urls, options?) // batch + collect all results await hx.scrapeMany(urls, options?) // parallel scrapes (concurrency cap)
const result = await hx.crawlAndCollect('https://example.com', { max_pages: 50, mode: ['content', 'audit'], onProgress: (status) => console.log(`${status.pages_crawled} pages...`), }); console.log(`Done: ${result.pages.length} pages`);
import { Hintrix, AuthenticationError, RateLimitError } from 'hintrix'; try { await hx.scrape(url); } catch (err) { if (err instanceof RateLimitError) { console.log(`Rate limited, retry after ${err.retryAfter}s`); } else if (err instanceof AuthenticationError) { console.log('Invalid API key'); } }
Other exported error classes: APIError, ConnectionError, TimeoutError, ValidationError. All extend APIError.
const hx = new Hintrix('hx_live_...', { baseUrl: 'https://hintrix.com', // default timeout: 30000, // 30s default maxRetries: 3, // auto-retry on 429 / 5xx });
All API requests require an API key passed via the X-API-Key header.
API keys follow this pattern: hx_live_sk_...
Keep your API key secret. Do not expose it in client-side code or public repositories. If compromised, regenerate it from your dashboard.
X-API-Key: hx_live_sk_a1b2c3d4e5f6...
Base URL: https://hintrix.com — five endpoints: /v1/scrape, /v1/audit, /v1/extract, /v1/crawl, /v1/batch.
Scrape a single URL. Returns clean Markdown content, and optionally a GEO audit with scores and issues.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The URL to scrape |
| mode | array | No | ["content"], ["audit"], or ["content","audit"]. Default: ["content"] |
| output_format | string | No | markdown, html, or text. Default: markdown |
| wait_for_js | boolean | No | JS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost. |
| include_links | boolean | No | Include extracted links in response. Default: true |
| include_schema | boolean | No | Include Schema.org / JSON-LD data. Default: true |
| respect_robots_txt | boolean | No | Respect robots.txt directives. Default: true |
| include_screenshot | boolean | No | Returns a base64-encoded PNG screenshot of the page. Default: false. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit. |
| include_diff | boolean | No | Compare with the previous scrape of this URL and return what changed (additions, deletions, changed lines). Previous content is stored per user per URL for 7 days, then automatically deleted. The first scrape of a URL returns null (no previous version to compare). Subsequent scrapes return a diff object. No extra credit cost. |
$ curl -X POST https://hintrix.com/v1/scrape \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{ "url": "https://example.com", "mode": ["content", "audit"], "include_links": true, "include_schema": true }' // Response { "agent": "reveal", "url": "https://example.com", "status_code": 200, "response_time_ms": 418, "js_rendered": false, "metadata": { "title": "About Us — Example", "description": "We build tools for developers.", "language": "en", "canonical": "https://example.com/about", "og": {} }, "content": { "markdown": "# About Us\n\nWe build tools for developers...", "word_count": 890 }, "links": [ { "href": "https://example.com/contact", "text": "Contact", "type": "internal", "nofollow": false }, { "href": "https://example.com/blog", "text": "Blog", "type": "internal", "nofollow": false } ], "schema_data": [{ "@type": "Organization", "name": "Example" }], "audit": { "geo_score": 72, "tech_score": 85, "issues": [ { "title": "PerplexityBot blocked in robots.txt", "severity": "critical", "category": "ai_bot_access", "fix": "Remove User-agent: PerplexityBot / Disallow: /" } ], "pagespeed": { "performance": 91, "accessibility": 87 }, "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." } }, "credits_used": 2, "request_id": "req_b2c3d4e5" }
Run a GEO readiness audit on a single URL. Uses plain HTTP (no JS rendering). Costs 2 credits. For JS-rendered audit results, use /v1/scrape with mode: ['content', 'audit'] instead (costs 2 credits with JS rendering included).
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The URL to audit |
| wait_for_js | boolean | No | JS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost. |
| include_links | boolean | No | Include extracted links in response. Default: true |
| include_schema | boolean | No | Include Schema.org / JSON-LD data. Default: true |
| respect_robots_txt | boolean | No | Respect robots.txt directives. Default: true |
$ curl -X POST https://hintrix.com/v1/audit \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{"url": "https://example.com"}' // Response { "url": "https://example.com", "status_code": 200, "response_time_ms": 389, "geo_score": 72, "tech_score": 85, "issues": [ { "title": "Missing Schema.org Organization markup", "severity": "high", "category": "structured_data", "fix": "Add JSON-LD Organization schema to <head>" }, { "title": "ChatGPT-User bot blocked", "severity": "critical", "category": "ai_bot_access", "fix": "Remove 'User-agent: ChatGPT-User' block from robots.txt" }, { "title": "No author attribution on content", "severity": "medium", "category": "eeat", "fix": "Add visible author name and link to author page" } ], "pagespeed": { "performance": 91, "accessibility": 87 }, "assets": { "llms_txt": "# Example\n\n> We build tools for developers.\n\n..." }, "credits_used": 2 }
Extract structured data from any page. Define a schema with CSS selectors or let auto-detection handle it. Works with SPAs and JSON endpoints.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | The URL to extract from |
| schema | object | No | Field-to-CSS-selector mapping. Omit for auto-detection. |
| wait_for_js | boolean | No | JS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost. |
| respect_robots_txt | boolean | No | Respect robots.txt directives. Default: true |
$ curl -X POST https://hintrix.com/v1/extract \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{ "url": "https://shop.example.com/product/123", "schema": { "name": "h1.product-title", "price": ".price-current", "description": ".product-description", "in_stock": ".availability" }, "wait_for_js": true }' // Response { "agent": "pinch", "url": "https://shop.example.com/product/123", "status_code": 200, "response_time_ms": 1204, "js_rendered": true, "data": { "name": "Wireless Headphones Pro", "price": "$149.99", "description": "Premium noise-cancelling headphones with 30h battery...", "in_stock": "In Stock" }, "credits_used": 2 }
Start an asynchronous multi-page crawl. Returns a job ID for polling progress and retrieving results.
| Parameter | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | Starting URL for the crawl |
| max_pages | integer | No | Maximum pages to crawl (1–5000). Default: 10 |
| max_depth | integer | No | Maximum link depth from starting URL (1–10). Default: 2 |
| mode | array | No | ["content"], ["audit"], or ["content","audit"]. Default: ["content"] |
| output_format | string | No | markdown, html, or text. Default: markdown |
| wait_for_js | boolean | No | JS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost. |
| include_links | boolean | No | Include extracted links per page. Default: true |
| include_schema | boolean | No | Include Schema.org / JSON-LD data per page. Default: true |
| respect_robots_txt | boolean | No | Respect robots.txt directives. Default: true |
| check_links | boolean | No | Check all discovered links for broken URLs (404s, redirects). Results appear in link_health on job completion. Default: false |
| webhook_url | string | No | URL to receive a POST request when the job completes. Optional. |
$ curl -X POST https://hintrix.com/v1/crawl \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{ "url": "https://docs.example.com", "max_pages": 100, "max_depth": 3, "mode": ["content"] }' // Response (HTTP 201) { "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "queued", "url": "https://docs.example.com", "mode": ["content"], "max_pages": 100, "pages_crawled": 0, "credits_used": 100, "created_at": "2026-04-02T10:00:00+00:00" }
Check the status and progress of a crawl job. Status values: queued, running, completed, failed.
$ curl https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6 \ -H "X-API-Key: hx_live_sk_..." // Response { "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "status": "completed", "url": "https://docs.example.com", "pages_crawled": 47, "pages_total": 47, "credits_used": 47, "error": null, "created_at": "2026-04-02T10:00:00+00:00", "started_at": "2026-04-02T10:00:02+00:00", "completed_at": "2026-04-02T10:02:14+00:00" }
When check_links: true was requested, a link_health object is included in the completed response showing broken and redirected URLs found during the crawl.
Retrieve paginated results from a completed crawl job. Query params: page (default: 1), page_size (default: 20).
$ curl "https://hintrix.com/v1/crawl/3fa85f64-5717-4562-b3fc-2c963f66afa6/pages?page=1&page_size=20" \ -H "X-API-Key: hx_live_sk_..." // Response { "job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6", "total": 47, "page": 1, "page_size": 20, "items": [ { "url": "https://docs.example.com", "status_code": 200, "response_time_ms": 310, "word_count": 340, "content_markdown": "# Documentation\n\nWelcome to the docs...", "metadata": { "title": "Documentation", "description": "..." }, "links": [{ "href": "https://docs.example.com/getting-started", "text": "Getting Started", "type": "internal", "nofollow": false }], "schema_data": null, "audit_result": null }, { "url": "https://docs.example.com/getting-started", "status_code": 200, "response_time_ms": 284, "word_count": 520, "content_markdown": "# Getting Started\n\nFollow these steps...", "metadata": { "title": "Getting Started", "description": "..." }, "links": [], "schema_data": null, "audit_result": null } ] }
Scrape up to 100 independent URLs in a single asynchronous job. Unlike /v1/crawl, batch does not follow links — each URL in the list is scraped independently.
Submit a list of URLs for parallel scraping. Returns a job_id immediately. Poll GET /v1/crawl/{job_id} for status and GET /v1/crawl/{job_id}/pages for per-URL results.
| Parameter | Type | Required | Description |
|---|---|---|---|
| urls | array | Yes | List of URLs to scrape. Max 100. |
| mode | array | No | ["content"], ["audit"], or ["content","audit"]. Default: ["content"] |
| output_format | string | No | markdown, html, or text. Default: markdown |
| wait_for_js | boolean | No | JS rendering is enabled by default. Set to false for faster plain HTTP scraping. No extra cost. |
| include_screenshot | boolean | No | Capture a base64 PNG screenshot per URL. Default: false. Screenshots are returned as base64 PNG in the response and are not stored. Save the screenshot data from the response — it cannot be retrieved later. +1 credit per URL. |
| respect_robots_txt | boolean | No | Respect robots.txt directives. Default: true |
| webhook_url | string | No | URL to receive a POST request when the batch job completes. Optional. |
$ curl -X POST https://hintrix.com/v1/batch \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{ "urls": [ "https://example.com", "https://example.com/about", "https://example.com/pricing" ], "mode": ["content"], "webhook_url": "https://yourapp.com/webhooks/hintrix" }' // Response (HTTP 201) { "job_id": "7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d", "status": "queued", "url": "https://example.com", "mode": ["content"], "max_pages": 3, "pages_crawled": 0, "credits_used": 3, "created_at": "2026-04-02T10:05:00+00:00" } // Poll for status $ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d \ -H "X-API-Key: hx_live_sk_..." // Retrieve results when completed $ curl https://hintrix.com/v1/crawl/7cb89a12-3f4e-4a9b-b1d2-0e8c5f9a6b3d/pages \ -H "X-API-Key: hx_live_sk_..."
Both /v1/crawl and /v1/batch support an optional webhook_url parameter. When a job completes (successfully or with an error), hintrix sends a POST request to your webhook URL with a JSON payload containing the final job status.
{
"job_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
"status": "completed",
"url": "https://docs.example.com",
"pages_crawled": 47,
"pages_total": 47,
"credits_used": 47,
"completed_at": "2026-04-02T10:02:14+00:00"
}
Respond to the webhook with any 2xx status within 10 seconds. Webhooks are not retried on failure. Use GET /v1/crawl/{job_id} to poll for status if you miss a delivery.
Some pages rely on JavaScript to render their content. SPAs built with React, Vue, Next.js, or Angular often return empty HTML to traditional crawlers.
hintrix uses full browser rendering by default for reliable content extraction. JS rendering is included at no extra cost — it does not affect the credit price.
Set wait_for_js: false if you want faster plain HTTP scraping. This may return incomplete content for JS-heavy pages, but is faster and still costs the same:
$ curl -X POST https://hintrix.com/v1/scrape \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_..." \ -d '{"url": "https://example.com", "wait_for_js": false}' // costs 1 credit — same as with JS rendering
Rate limits are applied per API key and per domain to ensure fair usage.
| Limit | Value |
|---|---|
| General requests | 100 requests per minute per key |
| Crawl jobs | 10 crawl starts per minute per key |
| Per domain (global) | 10 requests per minute per domain |
| Per domain (per user) | 100 requests per domain per day per user |
When you hit a rate limit, the API returns 429 Too Many Requests with a Retry-After header indicating when you can retry.
Certain domains cannot be scraped through hintrix.
Certain country-level and restricted TLDs are blocked based on legal jurisdiction. Requests to blocked domains or TLDs return HTTP 403.
Our User-Agent is HintrixBot/1.0 (+https://hintrix.com/bot). Website owners can control access via robots.txt.
All errors return a JSON body with an error field and a human-readable message.
| Status | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Missing or invalid parameters (e.g., invalid URL) |
| 401 | Unauthorized | Missing or invalid API key |
| 402 | Payment Required | Insufficient credits for this request |
| 403 | Forbidden | Domain blocked by content policy or robots.txt |
| 404 | Not Found | Endpoint or crawl job not found |
| 429 | Too Many Requests | Rate limit exceeded, check Retry-After header |
| 500 | Internal Server Error | Something went wrong on our end |
{
"error": "insufficient_credits",
"message": "This request requires 2 credits but you only have 1 remaining.",
"request_id": "req_c3d4e5f6"
}
Every API call costs credits. Credits never expire and can be topped up anytime. New accounts receive 500 free credits on signup — plus 500 bonus credits for sharing hintrix on X/Twitter.
| Action | Credits |
|---|---|
| /v1/scrape (content only) | 1 |
| /v1/scrape (audit only) | 1 |
| /v1/scrape (content + audit) | 2 |
| /v1/audit | 2 |
| /v1/extract | 2 |
| /v1/crawl — content mode | 1 per page |
| /v1/crawl — content + audit mode | 2 per page |
| /v1/batch | same as /v1/scrape per URL |
| JS rendering | included (no extra cost) |
| Screenshot add-on | +1 per request/URL |
JS rendering is included at no extra cost. For example: scrape = 1 credit. Scrape with audit = 2 credits. Crawl with content+audit = 2 credits per page. All of the above with full JS rendering costs the same. Credits for crawl jobs are pre-deducted and unused credits are refunded on completion.
Examples for the /v1/scrape endpoint with audit mode in multiple languages.
$ curl -X POST https://hintrix.com/v1/scrape \ -H "Content-Type: application/json" \ -H "X-API-Key: hx_live_sk_your_key_here" \ -d '{ "url": "https://example.com", "mode": ["content", "audit"], "include_links": true }'
import requests response = requests.post( "https://hintrix.com/v1/scrape", headers={ "Content-Type": "application/json", "X-API-Key": "hx_live_sk_your_key_here", }, json={ "url": "https://example.com", "mode": ["content", "audit"], "include_links": True, }, ) data = response.json() print(data["content"]["markdown"]) print(f"GEO Score: {data["audit"]['geo_score']}") for issue in data["audit"]["issues"]: print(f" [{issue['severity']}] {issue['title']}") print(f" Fix: {issue['fix']}")
const response = await fetch("https://hintrix.com/v1/scrape", { method: "POST", headers: { "Content-Type": "application/json", "X-API-Key": "hx_live_sk_your_key_here", }, body: JSON.stringify({ url: "https://example.com", mode: ["content", "audit"], include_links: true, }), }); const data = await response.json(); console.log(data.content.markdown); console.log(`GEO Score: ${data.audit.geo_score}`); data.audit.issues.forEach((issue) => { console.log(` [${issue.severity}] ${issue.title}`); console.log(` Fix: ${issue.fix}`); });
hintrix provides a Model Context Protocol (MCP) server for native integration with AI coding tools like Claude Code, Cursor, and Windsurf.
Add to your ~/.claude/settings.json:
{
"mcpServers": {
"hintrix": {
"type": "sse",
"url": "https://hintrix.com/mcp/sse",
"headers": {
"X-API-Key": "hx_live_sk_your_key_here"
}
}
}
}
Add to your .cursor/mcp.json in your project root:
{
"mcpServers": {
"hintrix": {
"type": "sse",
"url": "https://hintrix.com/mcp/sse",
"headers": {
"X-API-Key": "hx_live_sk_your_key_here"
}
}
}
}
Once configured, your AI assistant can use hintrix tools directly: scrape URLs for context, audit pages, extract data, and crawl domains — all within your coding workflow.