screen-scraper is a tool for extracting data from Web sites. It works much like a database that provides access to the information of the Web. It provides a graphical interface allowing you to designate URLs, data elements to be extracted, and scripting logic to traverse pages and work with scraped data. Once these items have been created, screen-scraper can be invoked from external languages such as .NET, Java, PHP, and Active Server Pages. It can be scheduled to scrape information at periodic intervals, and can automatically write extracted data to CSV files.
focuseek searchbox is a family of easily installable full-text search engines that can spider Internet and intranet data sources (Web sites, newsgroups, FTP sites, and others) or index data you feed to it and make it available for searching. It supports a variety of input formats (among them HTML, PDF, Microsoft Word DOC, and RTF), and is easily scriptable via SOAP and extendable through plugins. It can scale to millions of documents and comes with a full-fledged GUI client, a built in Web search portal, and an RSS server.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
libcurl.mono is an object-oriented binding for the libcurl Internet client API. It supports rapid-development of powerful Internet clients in C# that can run within the mono runtime. With libcurl.mono, for example, you can develop FTP and HTTP clients without having to code down at the socket level. There are a number of sample applications to get you started.
mojoPortal is a cross-platform object oriented Web site framework. It supports PostgreSQL, MySQL, Firebird, SQLite and MS SQL for the backend. It includes a content management system, forums, blogs, photo galleries, newsletter, polls, surveys, an event calendar, an RSS feed aggregator, and a skinnable design.