SILVERCODERS DocStorage is a utility to improve document management. You can have one database for all invoices, guarantees, protocols, and other documents. DocStorage can extract plain text from documents in doc, XLS, PPT, PDF, RTF, ODT, ODS, ODP, docx, XLSX, PPTX, and many other formats. It can use an OCR engine to extract plain text even from scanned documents. It can perform global fulltext search in all documents regardless of format. It supports document versioning, document duplicate detection, document notes, and document signing. It provides full integration with software suites like Microsoft Office and OpenOffice.
LuSql is a command line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode, it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation.
SCAN is a personal information retrieval framework, combining search, text analysis, tagging, and metadata functions for document collections management. SCAN is a component-based software using a number of plugins for specific features. The basic SCAN platform can be easily extended with plugins for different document formats and document location types.
TiTLi is a Google-like search tool for relational databases . It builds on top of Apache Lucene to provide an API and a GWT-based UI for searching multiple databases from various vendors simultaneously. It is very fast due to indexing, and the database is queried only when a record is chosen.
WAscii is a Web frontend intended to display an AsciiDoc documentation repository. It allows you to search and browse your documentation files and automatically converts AsciiDoc to HTML, PDF, and ODF documents. It is intended to work directly from a subversion repository containing your AsciiDoc files.
TextSearch is a program to search through a set of text files in a directory structure. Each document is searched using a regular expression and an overview of the results is shown as a tree structure. By clicking on a file, it can be viewed, with matches being highlighted. As opposed to other programs out there, its focus is not so much on statistics, i.e. how often a word would occur in an entire corpus of files, but rather on occurrences in single files.
Invenio (formerly CDSware) is a suite of applications that provides the framework and tools for building and managing an autonomous digital library server. It complies with the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic standard. Its flexibility and performance make it a comprehensive solution for the management of document repositories of moderate to large size.
Search::Xapian is a Perl XS frontend to the Xapian C++ search library. It is a fairly complete wrapper: most features of the Xapian library are made available for use from Perl. Xapian is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model as well as a rich set of boolean query operators. It's fast and scalable to hundreds of millions of documents.