Version 0.30 of Smart Cache Loader

Release Notes: Crawler can now extract links from page content using regular expresions (with possible replacement for URL rewriting). Crawler can now log depth for easier debugging, and tracking of known URLs can be set to two modes (the first saves memory, second CPU).

Other releases

  •  10 Aug 2007 11:51

Release Notes: Saving pages to local disk now works, and the program now uses the Host: header in outgoing requests for better virtual server support. HTML entities are decoded before extracting links, and gzip-encoded pages are requested from the server.

  •  08 Aug 2007 13:37

Release Notes: Crawler can now extract links from page content using regular expresions (with possible replacement for URL rewriting). Crawler can now log depth for easier debugging, and tracking of known URLs can be set to two modes (the first saves memory, second CPU).

  •  25 Jul 2007 06:53

Release Notes: Support was added for escaping "&" and "," in URLs. The delay parameter can now take time units like 1.3s and 2h. A new per-site parameter, "crawltime", (which works on the command line too) was added for limiting the time spent on crawling a site.

  •  15 Apr 2007 06:35

Release Notes: Support was added for crawling delays. Links of reject type are now logged, which is good for extracting URLs from a site. A crash which occurred when no default masks were used was fixed.

  •  13 Apr 2007 10:49

Release Notes: This release fixes various crashes. The documentation has been converted to Docbook and updated. A .jar file is distributed instead of .class files.

Screenshot

Project Spotlight

checkit

A file integrity tool.

Screenshot

Project Spotlight

pyAggr3g470r

A news reader.