Here are the limitations and restrictions you have to consider when you analyze a single URL with hyScore.
The list so far...
Pre-requisite: Using API endpoint v3/url
- The URL you'll send to the API should be valid. If it's a non-valid URL, you'll usually receive an "Error" message/status.
- Highly dynamic URLs (one-time URL) might be problematic.
- The content behind the single URL should contain at least more than 300 chars of content to be able to provide any meaningful result.
- Ad-Serving URLs, CDN URLs or any other URL which are directly related to a technology service will be, as soon as we get aware of it, been blocked by hyScore. Simple rule: no content - no sense. Best if you do not send it and filter these kinds of URLs upfront
- Plain login-pages and/or payment gateways, payment walls won't work for an analysis too. URLs, where you need to be logged in to the website (e.g. paid content) to get to the content, won't work until you are the origin website owner and allow hyScore to do so.
- The analysis of a single URL only starts when the full content can be loaded by our crawlers. If a website takes longer than 30 seconds to be loaded completely, we time out and return "Error - Target not reachable". You should talk to the website owner to optimize his website.
- If our crawler has not explicitly been white-listed by the domain owner/publisher we limit the number of requests per second to that domain to avoid DDOS prevention mechanisms and high load to target web server. This can sometimes result in slower than usual analysis speed. Example: Sending 100 to 1.000 requests per second for a specific domain might affect the target web server. In this specific cases, we queue these URLs for the domain and process it batch-like over time. This might result in "Incomplete, Analysis in Progress, Status: 9" messages via API response.
- Some domain owners/publishers block by default bots, services, and unknown crawler technology. In these cases, you'll receive an "Error, Blocked by robots.txt". You have to talk to the publisher to white-list hyScores' user agent (see Crawler) to enable us to analyze his content/URLs.
Comments
0 comments
Article is closed for comments.