Alkaline is a full-featured standalone search and index server. The spider is a fully remote indexing daemon which includes support for all standards like robots.txt and "skip" meta tags, and allows multiple distinct configurations and search groups (searching many different sites from your server), including complex regexp indexing paths, authentification, filters for various document formats, XML-based online management and statistics, mrtg-compatible perf numbers, and more.
Greenstone is a complete digital library creation, management, and distribution package for Unix, Windows, and Mac OS X. Users create collections by gathering a set of input documents, specifying a configuration file, and running the build script. It provides full-text and fielded searching, browsable indexes, customised formatting, metadata extraction (acronyms, languages, etc), a Z39.50 client, and many other features. It supports many input formats, the interface is configurable and multi-lingual, and collections can be distributed on the Web or on CD-ROM.
HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.
Namazu is a full-text search system intended for easy use. Not only does it work as a small or medium scale Web search engine, but also as a personal search system for email or other files. Supported document types: HTML, Mail/News, MHonArc, RFC, TeX (with detex), man (with groff), Word (with wvWare), PDF (with pdftotext) and plain text.
Net::Z3950::SimpleServer is a Perl module which implements the server side of the Z39.50 (information retrieval) protocol. It hides the complexity of network exchanges, packet serialization, and session handling. You are required only to implement simple callbacks to support searching and record retrieval. It is the basis of the "Zoogle" project, which is a Z39.50 gateway to the Google web index.
PHP Content Management System (phpCMS) makes it possible to need only one template for your whole Web site. It allows you to provide dynamic menus with unlimited levels, and use templates and sub-templates without a database. It is search engine-friendly and proxy-friendly, as the pages it generates can not be distinguished from static HTML pages. PHP code can be added to any template and content file with an optional module. It supports the caching of parsed pages and gzip compression.
Sitegrab is a URL grabber that parses IRC logs for data and then inserts it into a database that can be searched through a web interface using a combination of announcer name, date, partial url, and type of url. Also supported is an administration mode whereby urls can be edited and deleted through the web interface.