PDFTextStream is a PDF text and metadata extraction library available for Java and .NET. It supports all versions of the PDF document specification (including v1.7, used by Acrobat 8, 9, and X), extraction of text encoded using double-byte character sets (including Chinese, Japanese, and Korean), decryption of documents encrypted using 40-bit, 128-bit, 256-bit, and variable bit length ciphers, and extraction of all document metadata provided by PDF documents (including form data, bookmarks, and annotations). Easy integration with Jakarta Lucene is included, as well as interactive form update capability.
Tdbengine is an RDBMS with an integrated programming language. It represents the enhancement of the famous DOS-TDB, and is designed to handle databases on the Web. It connects to the Web server using the standard CGI interface, or runs on the command line. It is very small (about 400 KBytes), extremely quick, and easy to administrate. Its features are full text indexing, an automatic data link system, and the script language EASY, which replaces the commonly used SQL with its modular code.
PowerSeek SQL allows you to create, manage, and run your own search engine and directory portal with total control and ease. it is user friendly in every aspect and built for the most demanding uses and customization needs. It comes with an extensive admin panel, the ability to sell link listings, SEO friendly URLs, link reviews/ratings, content sensitive banner rotator, spam filter, broken link checker, custom data fields, mailers, crawlers, pre-designed template sets, reciprocal link checker, image/video/file uploading, RRS feeds, optional PPC functionality, and much more. It can be used for Yellow Pages, real estate, and travel directories, complex product catalogs, image galleries, and more.
Douglas Thrift's Search Engine is an indexing search engine for use on small Web sites such as personal or small business sites. It is designed to be very similar to Google for end users and its output is customizable. For indexing, it supports both the Robots Exclusion Protocol and the Robots META Tag.
PHP Content Management System (phpCMS) makes it possible to need only one template for your whole Web site. It allows you to provide dynamic menus with unlimited levels, and use templates and sub-templates without a database. It is search engine-friendly and proxy-friendly, as the pages it generates can not be distinguished from static HTML pages. PHP code can be added to any template and content file with an optional module. It supports the caching of parsed pages and gzip compression.
Alkaline is a full-featured standalone search and index server. The spider is a fully remote indexing daemon which includes support for all standards like robots.txt and "skip" meta tags, and allows multiple distinct configurations and search groups (searching many different sites from your server), including complex regexp indexing paths, authentification, filters for various document formats, XML-based online management and statistics, mrtg-compatible perf numbers, and more.