Generate a list of URLs to crawl based on URL parameters

One crawl strategy is to crawl a specific set of URLs by generating specific URLs with varied parameters and paths - this is a useful for scenarios where the URLs you want to crawl follow predictable patterns.


To set up a mult-page crawl by generating URLs:

1. Create a kimono API that scrapes the detail you want to gather from each URL

2. From the detail page for your Desired Data API, select 'Generate URLs from the crawl strategy drop down.



3. Use kimono's URL generator to vary your path and query parameters - choose from the default value, a numeric range, or a custom list


Now you can set your API crawl settings to crawl the URLs on a schedule. Note that with multi-page crawling, you can either schedule your crawls or you can trigger them manually or with the kimono RESTful API.

Note that the pages you crawl must have the same structure. This is because kimono finds the element you want to extract on a page based on the CSS selectors for that element, if you try to crawl a page that does not have that CSS selector, your crawl will fail.

If you want to crawl multiple pages, you can also source URls from a kimono API or manually input URLs.

Powered by Zendesk