Projects / Larbin

Larbin

Larbin is an HTTP Web crawler with an easy interface that runs under Linux. It can fetch more than 5 million pages a day on a standard PC (with a good network).

Tags
Licenses
Operating Systems
Implementation

RSS Recent releases

  •  16 Jul 2003 12:40

Release Notes: This release corrects some compilation tweaks with recent gcc versions, improves the configuration file parser, and adds new options for following links selectively.

  •  14 Apr 2002 17:02

Release Notes: This release compiles on Solaris, cookie management has been added, images can be fetched with pages, and many rewrites have been done for efficiency and portability.

  •  10 Mar 2002 05:31

Release Notes: With this release, it is possible again to crawl through a proxy, all configurations should compile (Linux and BSD), images can now be downloaded with pages, and the robots.txt parser has been enhanced.

  •  12 Jan 2002 17:02

Release Notes: Many efficiency updates were made to the sequencer, to buffer recycling, and to DNS management. A new output module for statistics has been added.

  •  12 Dec 2001 12:42

Release Notes: Output and buffer interfaces have been simplified. A dynamic buffer option has been added. The web server has been reworked.

RSS Recent comments

13 Oct 2001 12:20 nazgul

larbin@somewhere.com
I tried to reach the larbin project owner but I get the
following error, so I'm posting this here.

<sebastien.ailleret@inr...>
(reason: 550 5.7.1 <sebastien.ailleret@inr...>...
Access denied)

Either larbin does this as a default, or someone has
configured their version so, but I would appreciate it if
someone make sure that the the user-agent field for
larbin is *not* larbin@somewhere.com. I've gotten
several complaints, and I don't really appreciate it.

Screenshot

Project Spotlight

LynxFS

A filesystem driver for the LynxOS filesystem.

Screenshot

Project Spotlight

DocFetcher

A desktop search application.