Address failed URLs in your multi-page crawl

When kimono can't scrape a specific page on a multi-page crawl, kimono will attempt to crawl the problematic page three times before it considers the URL failed.

If pages in your crawl fail, Kimono will display the URLs of the failed pages in the 'Crawl Setup' tab of the API detail page.

With these failed URLs, you should first check to make sure that those URLs are valid (e.g., the pages really exist) and that they are structured in the same way as the source URL. If they are structured differently, then kimono will not be able to extract any data off of the page, which will cause it to return as failed.

If the failed pages are structured differently from the source URL, you can create a new API crawling specified URLs based off of the alternate page structure of the failed URLs. If you would still like the output from these two APIs to be combined, you can set up a meta API to return the API data together.

If the failed pages are valid URLs that have the same structure as the source URL, then kimono may have timed-out when trying to access that page. Please try retriggering your crawl by clicking stop and then start again from the 'Crawl Setup' tab of the Detail page.

A tricky edge case for multi-page crawling is when a site is performing A/B testing. In this case some of the extraction rules in your kimono API won't work on the alternate version of the page. If you think this is happening, it can help to reload the page a few times and see the alternate CSS selectors generated for the elements you want to extract. You can then expand your CSS selectors to account for both cases. 

 

Powered by Zendesk