GPP is a general-purpose preprocessor with customizable syntax, suitable for a wide range of preprocessing tasks. Its independence from any programming language makes it much more versatile than cpp, while its syntax is lighter and more flexible than that of m4. The syntax is fully customizable, which makes it possible to process text files, HTML, or source code equally efficiently in a variety of languages.
screen-scraper is a tool for extracting data from Web sites. It works much like a database that provides access to the information of the Web. It provides a graphical interface allowing you to designate URLs, data elements to be extracted, and scripting logic to traverse pages and work with scraped data. Once these items have been created, screen-scraper can be invoked from external languages such as .NET, Java, PHP, and Active Server Pages. It can be scheduled to scrape information at periodic intervals, and can automatically write extracted data to CSV files.
focuseek searchbox is a family of easily installable full-text search engines that can spider Internet and intranet data sources (Web sites, newsgroups, FTP sites, and others) or index data you feed to it and make it available for searching. It supports a variety of input formats (among them HTML, PDF, Microsoft Word DOC, and RTF), and is easily scriptable via SOAP and extendable through plugins. It can scale to millions of documents and comes with a full-fledged GUI client, a built in Web search portal, and an RSS server.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
SOHT (Socket over HTTP Tunneling) allows you to tunnel socket connections through an HTTP proxy. Restrictive firewalls often prohibit all outgoing trafic except for HTTP. This application allows you to tunnel socket connections over the HTTP protocol. This application consists of a server that serves as a proxy and a client which tunnels a socket connection over an HTTP connection to the server. The current server is written in Java, and there are clients in Java and .NET.