Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
Clipboard Modifier is a flexible system to modify the text in a clipboard in a variety of ways. It can copy a spreadsheet and change the clipboard so that it can be pasted into a wiki, with vertical bars (|) instead of tabs. It can modify multi-line clipboard text so that it can be pasted into Java or Python as strings. An URL in the clipboard pointing to Amazon can be modified so that it has your Associate ID in it. It can pipe the clipboard to a shell command and retrieve the output from it. A clibpboard can be forced to text, removing things like formatting. A complicated URL can be converted into its Python equivalent, using urlencode.
LuSql is a command line Java application for the construction of a Lucene index from an arbitrary SQL query of a JDBC-accessible SQL database. It allows a user to control a number of parameters, including the SQL query to use, individual indexing/storage/term-vector nature of fields, analyzer, stop word list, and other tuning parameters. In its default mode, it uses threading to take advantage of multiple cores. LuSql can handle complex queries, allows for additional per record sub-queries, and has a plug-in architecture for arbitrary Lucene document manipulation.
OpenHTMM is an implementation specifically designed to facilitate research into Hidden Topic Markov Models methods. Hidden Topic Markov Models (HTMM) is a method of analyzing a docyment by imposing a temporal Markov structure on the document. In this way, it is able to account for shifting topics within a document. In so doing, it provides a topic segmentation within the document and also seems to effectively distinguish among multiple senses that the same word may have in different contexts within the same document.
ProteomeCommons.org IO Framework is a proper Java framework for handling spectra and peak lists. The framework can read and write to a number of different spectra and peak list formats, and it provides a simple, intuitive Java object model for working with spectra or peak lists. All classes support two methods of handling peak list and spectrum data: in-memory or stream. The goal of this framework is to support all the popular MS and MSMS data formats, and to eliminate any time or effort involved in figuring out how to read and write peak list or spectrum files.
SCAN is a personal information retrieval framework, combining search, text analysis, tagging, and metadata functions for document collections management. SCAN is a component-based software using a number of plugins for specific features. The basic SCAN platform can be easily extended with plugins for different document formats and document location types.
Serene is a validation engine that implements the JAXP 1.3 Validation Framework API for RELAX NG based on an algorithm centered on providing good messages and having a clear handling of ambiguity and conflicts. It has an implementation of the JAXP Validation Framework API for ISO Schematron and support for Schematron markup embedded in RELAX NG schemas.