Sanzang is a compact and simple cross-platform machine translation system. It is especially useful for translating from the CJK languages (Chinese, Japanese, and Korean), and it is very suitable for working with ancient and otherwise difficult texts. Unlike most other machine translation systems, Sanzang is small and approachable. Any user can develop his or her own translation rules, and these rules are simply stored in a text file and applied at runtime.
Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
Winnow efficiently trains and operates any number of unique Bayesian (Naive Bayes) classifiers on large sets of content. It has very high performance and works with very small training and unbalanced training sets. It has been used to power an innovative Web feed reader that uses smart tags, which learn and find the content you want to see, from more sources than you can follow with traditional feed readers. It works particularly well with Ruby and Ruby on Rails.
deplate converts wiki-like markup to LaTeX (standard classes, koma, dramatist, sweave), HTML/PHP (single page, chunked/website, HTML, or s5-based slideshow), DocBook (article, book, man/ref page), and really plain text. Currently supported input formats are viki and Ruby's rdoc. The viki markup supports footnotes, citations, index, table of contents, embedded LaTeX for mathematics, integration with R for dynamically generated figures and tables, and more. Output can be customized via page templates.
SiSU (Structured information, Serialized Units) is a lightweight markup based, text structuring and publishing framework (that features granular search). With minimal markup of a plaintext file, it produces: plain-text, HTML, XHTML, XML, ODF, LaTeX, PDF, and populates an SQL database at an object/paragraph level for granular searches. Prepare documents using your text editor of choice, then use SiSU to generate the desired output formats. SiSU is controlled from the command line.
rwdgutenberg is a book reading tool. It can find text files, build lists of text files, auto read texts, has context sensitive help, and can submit bug reports with one click. Additional applets can be downloaded. It includes the rwdtinker framework, and should not require any other downloads. It only requires that Ruby be installed, and should work on all platforms.
rwdhypernote is a hierarchical note editor. It uses a directory structure for notes, and can record internal links and Web links. It has context-sensitive help. Additional applets can be downloaded. The GUI interface used is RubyWebDialogs, which runs through a Web browser. Therefore, it is completely cross-platform. This is part of the Tinker framework using Ruby, so applets can be added and removed.