SpiderBot crawls the Web, retrieves content, and performs actions on the content. It is an effort to design and develop a truly pipelined distributed Web crawler.
Methanol is a modular, customizable Web crawling system with crawlers optimized for speed. It is designed to allow the administrator to set up any kind of filetype handling, parsing, and indexing rules.