Source URLs to crawl from another kimono API

One crawl strategy is to crawl a specific set of URLs that you source from another kimono API - this is useful for scenarios where the links to a page you want to crawl are located in a central place (e.g., an overview page of products, people, real estate etc). Check out this video tutorial to get more information. Here is an example centralized page with links to product pages:

 

To set up a mult-page crawl by sourcing URLs from another kimono API, there are a few steps:

1. Create a kimono API that scrapes the links to the pages you want to crawl

Select the data:

Save the API:

2. Create a kimono API that scrapes the detail you want to gather from each URL. You should do this is on one of the links you just scraped.

3. From the detail page for your desired data API, select 'URLs from source API' from the crawl strategy drop down.

and select the LinksAPI from the drop down menu - this menu will contain all of your kimono APIs - and then select the data property that contains the URLs you want to crawl.

Now you can select your API crawl settings to grab the data on your schedule. Note that with multi-page crawling, you can either scheduled your crawls or trigger them manually or with the kimono RESTful API.

Note that the pages you crawl must have the same structure as the source URL. This is because kimono finds the element you want to extract on a page based on the CSS selectors for that element, if you try to crawl a page that does not have that CSS selector, your crawl will fail.

If you want to crawl multiple pages but the URLs are not located in a central place, you can also specify which URLs to crawl by generating URLs with a predictable path/query pattern, or manually inputing URLs.

Powered by Zendesk