Projects / Ebot

Ebot

Ebot is a scalable and distribuited Web crawler. The URLs are saved to a NOSQL database (which supports map/reduce queries) that you can query via RESTful HTTP requests or using your preferred programming languages. The URLs that need to be analyzed are sent to AMQP queues. In this way, it is possible to run several crawlers in parallel and stop and start them without losing URLs.

Tags
Licenses
Operating Systems
Implementation

Recent releases

  •  29 Dec 2010 18:10

    Release Notes: A better plugin architecture. New plugins for saving image URLs and titles of HTML pages.

    •  18 Dec 2010 08:33

      Release Notes: This release was updated to the latest releases of rabbitmq (2.2.0) and couchbeam. Some bugs were fixed.

      •  17 Sep 2010 20:18

        Release Notes: Compatibility with Erlang R14A. Tested with Debian Testing.

        •  26 Aug 2010 18:18

          Release Notes: Compatibility with the latest (development) releases of required libraries and software (rabbitmq, couchdb, webmachine, riak, etc.)

          •  20 Jun 2010 10:00

            Release Notes: For better scalability, new amqp queues are used (ebot.new.* ebot.fetched.* ebot.completed.* ebot.refused.*) and the old core of the crawler (ebot_web) is now split in two different and parallel modules/processes (ebot_html and ebot_web).

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.