crawlio

๐ŸŒ Crawl Endpoint

Initiate a full website crawl starting from a given URL. Crawlio will recursively follow links and extract content from each page, subject to the options you provide.

๐Ÿงฐ Using with SDKs

Prefer code over curl? Crawlio offers official SDKs for seamless integration with your stack:

๐Ÿ“– View full usage docs: ๐Ÿ‘‰ Node.js SDK Docs ๐Ÿ‘‰ Python SDK Docs

We are working on an extensive documentation on our SDKs. Thanks for your cooperation!

Cost

NameCostType
CrawlNumber of crawled pagesScrape
Crawl Limit1Deduction of crawl limit. Only For Free plans

๐ŸŒ POST /crawl

๐Ÿ“ฅ Request

Endpoint:

POST https://crawlio.xyz/api/crawl

Headers:

Authorization: Bearer YOUR_API_KEY  
Content-Type: application/json

Request Body Parameters:

FieldTypeRequiredDescription
urlstringโœ… YesThe starting URL for the crawl. Crawlio will follow internal links.
excludearray of stringsโŒ NoCSS selectors to remove specific elements from all crawled pages.
markdownbooleanโŒ NoWhether to extract and store each pageโ€™s content in Markdown format.
countnumberโŒ NoLimit the number of pages to crawl. Useful for controlling job size.

๐Ÿงพ Example Request

POST /crawl
Content-Type: application/json
Authorization: Bearer YOUR_API_KEY

{
  "url": "https://example.com/blog",
  "exclude": ["nav", ".ads"],
  "markdown": true,
  "count": 50
}

๐Ÿ“ค Response

On success, youโ€™ll receive a 200 OK response containing a unique crawl identifier.

FieldTypeDescription
crawlIdstringA unique ID that represents this crawl job. You can use this to track status or retrieve results later.

๐Ÿ“ฆ Example Response

{
  "crawlId": "crawl_456def"
}

This endpoint is ideal for scraping multiple pages within the same domain โ€” like blogs, documentation sites, product catalogs, etc. Use it in combination with the job status or results endpoints (if available) to retrieve data once the crawl completes.

See Postman Collection for more detailed information on getting the crawled data and job status :


What and Why?

The Crawl feature is designed to extract content from an entire website or section of a site, starting from a single URL and following internal links automatically. It's perfect when you want to capture all related pages without having to submit each URL manually.

Use Cases:

  • ๐Ÿ“š Crawling all blog posts under /blog
  • ๐Ÿ›๏ธ Extracting all product pages in a category
  • ๐Ÿงพ Indexing a documentation site for analysis or backup
  • ๐Ÿง  Feeding a knowledge base or AI model with structured content

Key Capabilities:

  • ๐Ÿ” Recursive link following โ€” Crawlio will discover and scrape connected pages automatically.
  • ๐Ÿƒ Wildcard and path-based targeting โ€” Focus on specific areas of a site.
  • ๐Ÿงน Exclude unwanted elements like ads, footers, or nav bars.
  • ๐Ÿ“ Markdown output for clean, structured content storage or processing.
  • ๐Ÿ”ข Page limit control via the count parameter.

Use /crawl when you want to automate comprehensive scraping of a website without managing a list of URLs manually. It pairs perfectly with batch processing, AI pipelines, or structured content ingestion workflows.

On this page