Ebot is a scalable and distribuited Web crawler. The URLs are saved to a NOSQL database (which supports map/reduce queries) that you can query via RESTful HTTP requests or using your preferred programming languages. The URLs that need to be analyzed are sent to AMQP queues. In this way, it is possible to run several crawlers in parallel and stop and start them without losing URLs.
Plasma implements the map/reduce framework on a compute cluster. It has its own distributed filesystem, PlasmaFS, which is transactional (ACID), reliable, and fast, and which provides a complete set of file operations. PlasmaFS can be accessed via an RPC protocol or via NFS (i.e., it is mountable). Additionally, there is a key/value database on top of PlasmaFS.
Neo4j is a graph database, a fully transactional database that stores data structured as graphs. A graph is a flexible data structure that allows for a more agile and rapid style of development. You can think of Neo4j as a high-performance graph engine with all the features of a mature and robust database. The programmer works with an object-oriented, flexible network structure rather than with strict and static tables, yet enjoys all the benefits of a fully transactional, enterprise-strength database. The community edition is GPLv3 licensed, while the advanced and enterprise editions are AGPLv3 licensed.
couchCurl is a simple static PHP class that generates curl commands to work with CouchDB databases. It aims for quick and easy access to local CouchDB databases without authentication, and with little PHP processing overhead. It assumes that your PHP installation has exec() enabled and that the user can use curl. It supports most of the API (PUT, POST, GET) and adds a little extra stuff for making VIEWS easier to work with, and a function to help compress or make custom _ids.