๐ Crawl Endpoint
Initiate a full website crawl starting from a given URL. Crawlio will recursively follow links and extract content from each page, subject to the options you provide.
๐งฐ Using with SDKs
Prefer code over curl? Crawlio offers official SDKs for seamless integration with your stack:
- Node.js SDK (npm) โ Perfect for backend automation, agents, and JS projects.
- Python SDK (PyPI) โ Ideal for data science, AI/ML workflows, and scripting.
๐ View full usage docs: ๐ Node.js SDK Docs ๐ Python SDK Docs
We are working on an extensive documentation on our SDKs. Thanks for your cooperation!
Cost
Name | Cost | Type |
---|---|---|
Crawl | Number of crawled pages | Scrape |
Crawl Limit | 1 | Deduction of crawl limit. Only For Free plans |
๐ POST /crawl
๐ฅ Request
Endpoint:
Headers:
Request Body Parameters:
Field | Type | Required | Description |
---|---|---|---|
url | string | โ Yes | The starting URL for the crawl. Crawlio will follow internal links. |
exclude | array of strings | โ No | CSS selectors to remove specific elements from all crawled pages. |
markdown | boolean | โ No | Whether to extract and store each pageโs content in Markdown format. |
count | number | โ No | Limit the number of pages to crawl. Useful for controlling job size. |
๐งพ Example Request
๐ค Response
On success, youโll receive a 200 OK
response containing a unique crawl identifier.
Field | Type | Description |
---|---|---|
crawlId | string | A unique ID that represents this crawl job. You can use this to track status or retrieve results later. |
๐ฆ Example Response
This endpoint is ideal for scraping multiple pages within the same domain โ like blogs, documentation sites, product catalogs, etc. Use it in combination with the job status or results endpoints (if available) to retrieve data once the crawl completes.
See Postman Collection for more detailed information on getting the crawled data and job status :
What and Why?
The Crawl feature is designed to extract content from an entire website or section of a site, starting from a single URL and following internal links automatically. It's perfect when you want to capture all related pages without having to submit each URL manually.
Use Cases:
- ๐ Crawling all blog posts under
/blog
- ๐๏ธ Extracting all product pages in a category
- ๐งพ Indexing a documentation site for analysis or backup
- ๐ง Feeding a knowledge base or AI model with structured content
Key Capabilities:
- ๐ Recursive link following โ Crawlio will discover and scrape connected pages automatically.
- ๐ Wildcard and path-based targeting โ Focus on specific areas of a site.
- ๐งน Exclude unwanted elements like ads, footers, or nav bars.
- ๐ Markdown output for clean, structured content storage or processing.
- ๐ข Page limit control via the
count
parameter.
Use /crawl
when you want to automate comprehensive scraping of a website without managing a list of URLs manually. It pairs perfectly with batch processing, AI pipelines, or structured content ingestion workflows.
๐ฆ Batch Scrape
Scrape multiple webpages in a single batch request. This endpoint is ideal for bulk extraction jobs where you need to process multiple URLs with shared options.
๐ Search Endpoint
Submit a search query and retrieve the top 10 results from a supported search engine. This endpoint is useful for discovery, automation workflows, SEO tools, and more.