Initiate a full website crawl starting from a given URL. Crawlio will recursively follow links and extract content from each page, subject to the options you provide.

🧰 Using with SDKs

Prefer code over curl? Crawlio offers official SDKs for seamless integration with your stack:

Node.js SDK (npm) – Perfect for backend automation, agents, and JS projects.
Python SDK (PyPI) – Ideal for data science, AI/ML workflows, and scripting.

📖 View full usage docs: 👉 Node.js SDK Docs 👉 Python SDK Docs

We are working on an extensive documentation on our SDKs. Thanks for your cooperation!

Cost

Name	Cost	Type
Crawl	Number of crawled pages	Scrape
Crawl Limit	1	Deduction of crawl limit. Only For Free plans

🌐 `POST /crawl`

📥 Request

Endpoint:

POST https://crawlio.xyz/api/crawl

Headers:

Authorization: Bearer YOUR_API_KEY  
Content-Type: application/json

Request Body Parameters:

Field	Type	Required	Description
`url`	`string`	✅ Yes	The starting URL for the crawl. Crawlio will follow internal links.
`exclude`	`array of strings`	❌ No	CSS selectors to remove specific elements from all crawled pages.
`markdown`	`boolean`	❌ No	Whether to extract and store each page’s content in Markdown format.
`count`	`number`	❌ No	Limit the number of pages to crawl. Useful for controlling job size.

🧾 Example Request

POST /crawl
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

{
  "url": "https://example.com/blog",
  "exclude": ["nav", ".ads"],
  "markdown": true,
  "count": 50
}

📤 Response

On success, you’ll receive a 200 OK response containing a unique crawl identifier.

Field	Type	Description
`crawlId`	`string`	A unique ID that represents this crawl job. You can use this to track status or retrieve results later.

📦 Example Response

{
  "crawlId": "crawl_456def"
}

This endpoint is ideal for scraping multiple pages within the same domain — like blogs, documentation sites, product catalogs, etc. Use it in combination with the job status or results endpoints (if available) to retrieve data once the crawl completes.

See Postman Collection for more detailed information on getting the crawled data and job status :

What and Why?

The Crawl feature is designed to extract content from an entire website or section of a site, starting from a single URL and following internal links automatically. It's perfect when you want to capture all related pages without having to submit each URL manually.

Use Cases:

📚 Crawling all blog posts under /blog
🛍️ Extracting all product pages in a category
🧾 Indexing a documentation site for analysis or backup
🧠 Feeding a knowledge base or AI model with structured content

Key Capabilities:

🔁 Recursive link following — Crawlio will discover and scrape connected pages automatically.
🃏 Wildcard and path-based targeting — Focus on specific areas of a site.
🧹 Exclude unwanted elements like ads, footers, or nav bars.
📝 Markdown output for clean, structured content storage or processing.
🔢 Page limit control via the count parameter.

Use /crawl when you want to automate comprehensive scraping of a website without managing a list of URLs manually. It pairs perfectly with batch processing, AI pipelines, or structured content ingestion workflows.

🌐 Crawl Endpoint