PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
PowerSeek SQL allows you to create, manage, and run your own search engine and directory portal with total control and ease. it is user friendly in every aspect and built for the most demanding uses and customization needs. It comes with an extensive admin panel, the ability to sell link listings, SEO friendly URLs, link reviews/ratings, content sensitive banner rotator, spam filter, broken link checker, custom data fields, mailers, crawlers, pre-designed template sets, reciprocal link checker, image/video/file uploading, RRS feeds, optional PPC functionality, and much more. It can be used for Yellow Pages, real estate, and travel directories, complex product catalogs, image galleries, and more.
WebGlimpse is a scalable, feature-rich search engine for indexing your Web site or any collection of local and remote sites you choose. Features include customizable output formats, custom ranking/ordering of hits, fuzzy matching, boolean queries, a Web administration interface for multiple archives, logging of queries, caching of results, and more. Localized search interfaces are provided in multiple languages including Spanish, German, French, Italian, Norwegian, Finnish, Russian, Hebrew, and others. It supports 3rd party filters for indexing PDF, Word, and Excel files. It is free for academic and most nonprofit users.
Douglas Thrift's Search Engine is an indexing search engine for use on small Web sites such as personal or small business sites. It is designed to be very similar to Google for end users and its output is customizable. For indexing, it supports both the Robots Exclusion Protocol and the Robots META Tag.
PHP Content Management System (phpCMS) makes it possible to need only one template for your whole Web site. It allows you to provide dynamic menus with unlimited levels, and use templates and sub-templates without a database. It is search engine-friendly and proxy-friendly, as the pages it generates can not be distinguished from static HTML pages. PHP code can be added to any template and content file with an optional module. It supports the caching of parsed pages and gzip compression.
Amberfish is a general purpose text/XML retrieval utility. It features indexing of both free text and nested fields, built-in support for XML documents, structured queries allowing generalized field/tag paths, hierarchical result sets, automatic searching across multiple databases, efficient indexing, and relatively low memory requirements.
Tdbengine is an RDBMS with an integrated programming language. It represents the enhancement of the famous DOS-TDB, and is designed to handle databases on the Web. It connects to the Web server using the standard CGI interface, or runs on the command line. It is very small (about 400 KBytes), extremely quick, and easy to administrate. Its features are full text indexing, an automatic data link system, and the script language EASY, which replaces the commonly used SQL with its modular code.