Smart Cache Loader is a very configurable Web grabber with special Smart Cache support.
|Tags||Internet Web Site Management Link Checking Indexing/Search web crawler|
|Operating Systems||OS Independent|
Release Notes: Saving pages to local disk now works, and the program now uses the Host: header in outgoing requests for better virtual server support. HTML entities are decoded before extracting links, and gzip-encoded pages are requested from the server.
Release Notes: Crawler can now extract links from page content using regular expresions (with possible replacement for URL rewriting). Crawler can now log depth for easier debugging, and tracking of known URLs can be set to two modes (the first saves memory, second CPU).
Release Notes: Support was added for escaping "&" and "," in URLs. The delay parameter can now take time units like 1.3s and 2h. A new per-site parameter, "crawltime", (which works on the command line too) was added for limiting the time spent on crawling a site.
Release Notes: Support was added for crawling delays. Links of reject type are now logged, which is good for extracting URLs from a site. A crash which occurred when no default masks were used was fixed.
Release Notes: This release fixes various crashes. The documentation has been converted to Docbook and updated. A .jar file is distributed instead of .class files.