index.rb is a general indexing framework for Ruby. With it, you can create collections of documents, then index and search them. The traditional inverted index is supported, as is Latent Semantic Indexing (LSI). Input documents may be stemmed, to make user queries more general. It also provides TextTiling to break input documents covering multiple topics into topic-specific sub-documents.
BDDs [bry86] (or more precisely ROBDDs) are efficient data structures for representing a boolean formula. They are widely used in formal verification, in particular symbolic model-checking. Ruby- BDD, based on Buddy, provides access to BDDs from Ruby, a powerful and very easy to use object-oriented language. The purposes are quick prototyping and education.
utf8proc is a library for processing UTF-8 encoded Unicode strings. Some features are Unicode normalization, stripping of default ignorable characters, case folding, and detection of grapheme cluster boundaries. The library can be used in C programs, but most of the functionality is also available as a ruby library. For PostgreSQL, there is an extension providing a function for preparing strings in case insensitive indices. The currently supported Unicode version is 5.0.0.
Tartan is a text parsing engine targeted at wiki text. The syntax specification is defined in YAML in the form of regex-based rules. It supports layering and multiple output types. Rules for Markdown to HTML are included, with optional layered extensions for tables. It is implemented in Ruby, but looking to have implementations in other languages.