Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of efficient XML processing technologies centered around a non-extractive XML parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as either an XML parser, a native XML indexer or a file format that uses binary data to enhance the text XML, an incremental XML content modifier, an XML slicer/splitter/assembler, or an XML editor/eraser.
GPP is a general-purpose preprocessor with customizable syntax, suitable for a wide range of preprocessing tasks. Its independence from any programming language makes it much more versatile than cpp, while its syntax is lighter and more flexible than that of m4. The syntax is fully customizable, which makes it possible to process text files, HTML, or source code equally efficiently in a variety of languages.
Grammatica is a parser generator (compiler compiler) for C# and Java. It improves upon similar tools (like yacc and ANTLR) by creating well-commented and readable source code, by having automatic error recovery and detailed error messages, and by support for testing and debugging grammars without generating source code. Grammatica supports LL(k) grammars with an unlimited number of look-ahead tokens.
AntiCutAndPaste is designed to search for text fragments that have been copied and pasted in programming language source code or plain text. It has been tested on sources from large C++, Pascal, Java, and C# (Mono) projects. The algorithms used are very fast and can handle up to three million C++ code lines in one minute. Minor modifications of code are ignored during the search. Reports are sorted conveniently by the total size of all similar fragments and there are many report customization options.
The Stylus/Handwriting Input Panel (SHIP) is a system for gesture text entry for Tablet computers using an X11 user interface. Text may be entered from an on-screen keyboard or by handwriting (either printed or cursive handwriting), but this requires a server application (which is included) to be installed on a copy of Windows Tablet XP or Vista.
Bibliographer is a bibtex bibliography database editor which aims to be easy to use. Its features include linking files to your records with indexing and searching support. The interface is designed for the easy navigation of your bibliography. Double clicking a record will open the linked file.
The MfGames.Template library is a C# native library for creating template libraries. It was inspired by NVelocity, but was designed from the ground up to use the CIL internals, such as System.CodeDom and internal compilation to handle the template language. Because of this, it supports C# code as the template "language". In addition, compiled templates are actually compiled down into bytecode using the built-in compiler.