Thursday, April 02, 2009

Live recent network statistics available

The 0.3 releases provide the ability to show live statistics to diagnose some network issues without having to look at log files. The URL is /admin/stats
  • Shows the last time the internal provider blacklist reset, indicating how much activity is being displayed as the statistics are reset everytime the blacklist is reset. This blacklist is only implemented to prevent malfunctioning queries from being further communicated with.
  • By default shows the IP's accessing the server, with an indication of the total number and duration of their queries. Can be configured in low use and private situations to also show the queries being performed
  • Shows the servers which have been unresponsive since the last blacklist reset including a basic reason, such as an HTTP 503 or 400 error
There is also a live blacklisting functionality provided in version 0.3.2 to prevent crawlers who regularly utilise functionality that they shouldn't according to the Bio2RDF robots.txt file. The settings for this have been set rather high by default, and this functionality can be turned off completely by people who download and install the package and datasets locally. Specifically, a regular user of the public mirrors should make sure that they are not making either more than 40 requests in each 12 minute statistics period, or if they are making more than 40 requests in each 12 minute period, more than 25% of the queries should be for non-Robots.txt queries. These parameters will possibly change depending on further investigation. An individual can access /error/blacklist even if they are not blacklisted currently to show a list of requests from their IP address since the start of the last 12 minute statistics period.

No comments: