Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).
|Tags||Internet Web Indexing/Search|
|Operating Systems||POSIX Linux BSD FreeBSD|
Release Notes: This release corrects some compilation tweaks with recent gcc versions, improves the configuration file parser, and adds new options for following links selectively.
Release Notes: This release compiles on Solaris, cookie management has been added, images can be fetched with pages, and many rewrites have been done for efficiency and portability.
Release Notes: With this release, it is possible again to crawl through a proxy, all configurations should compile (Linux and BSD), images can now be downloaded with pages, and the robots.txt parser has been enhanced.
Release Notes: Many efficiency updates were made to the sequencer, to buffer recycling, and to DNS management. A new output module for statistics has been added.
Release Notes: Output and buffer interfaces have been simplified. A dynamic buffer option has been added. The web server has been reworked.