Ruby/Google offers a higher-level abstraction of Google's SOAP-driven Web API. It allows the user to programatically query the Google search engine from Ruby. The aim of the library is to shield the programmer from the details of the raw data structures returned by the Web API, and in the process make the API more accessible for everyday use.
This is a tool to collect information from web servers and to spider the web sites. This was written for the Open Source Security Testing Methodology (OSSTM) located on http://www.ideahamster.org/osstmm- description.htm. The spider is a multi-threaded resusable module that can be used in other projects.
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.
Docindex is an open, extensible system that permits Web-based catalog searches and access-controlled fetch from a group of document repositories on multiple CVS (extensible to other) servers. Documents remain under CVS version control and are made available to Web users using bookmarkable URLs pointing to specific versions or branches.