ALTSE is an alternative search engine technology. It can index up to a couple million Web pages. Altse does not aim to replace existing search engine technologies such as Google, Yahoo, and so on. Instead, it aims to provide an affordable alternative to index Web pages for small size businesses and organizations.
Catacomb is a WebDAV repository module for use with the Apache WebDAV module, mod_dav. Apache mod_dav parses WebDAV protocol requests into operations on a repository providing persistent storage of resources and their properties. The default repository for mod_dav is provided by a separate module, mod_dav_fs, which stores resource bodies as files in the filesystem, and stores properties in a (G)DBM database. Catacomb provides a replacement for mod_dav_fs called mod_dav_repos that stores resources and their properties in a relational database (MySQL). The primary advantage of this approach is the searching capabilities of the database are used to implement the DASL protocol.
Catacomb is a WebDAV repository module for use with the Apache WebDAV module, mod_dav. Apache mod_dav parses WebDAV and DeltaV protocol requests into operations on a repository providing persistent storage of resources and their properties. The default repository for mod_dav is provided by a separate module, mod_dav_fs, which stores resource bodies as files in the filesystem, and stores properties in a (G)DBM database. It could be used for server side searching and versioning of files over the HTTP protocol.
DataparkSearch is a Web search engine tool. It features support for http, https, ftp, nntp, and news URLs, htdb virtual URL support for indexing SQL databases, text/html, text/xml, text/plain, audio/mpeg (MP3), and image/gif mime types built-in support, external parsers support for other document types, the ability to index multilangual sites using content negotiation, searching of all of the word forms using ispell affixes and dictionaries, stopwords and synonyms lists, boolean query language support, results sorting by relevancy, popularity rank, last modified time, and importance (a multiplication of the relevancy and popularity ranks), support for various character sets, and phrases segmenting for the Chinese, Japanese, Korean, and Thai languages. It has accent-insensitive search, mod_dpsearch for Apache, and support for internationalized domain names.
DirList is a user directory system that runs as a CGI to serve up user lists, search for various user attributes, view their web sites, define personalised user attributes, and keep it all synchronized automatically with the underlying operating system's user database on periodic intervals with cron.
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.
Greenstone is a complete digital library creation, management, and distribution package for Unix, Windows, and Mac OS X. Users create collections by gathering a set of input documents, specifying a configuration file, and running the build script. It provides full-text and fielded searching, browsable indexes, customised formatting, metadata extraction (acronyms, languages, etc), a Z39.50 client, and many other features. It supports many input formats, the interface is configurable and multi-lingual, and collections can be distributed on the Web or on CD-ROM.