Xapian is a search engine library, scalable to collections containing hundreds of millions of documents. It's written in C++ with bindings for Perl, Python, PHP, Java, Tcl, C#, Ruby, and Lua. It is a highly adaptable toolkit that allows developers to easily add advanced indexing and search facilities to their own applications. It supports the Probabilistic Information Retrieval model and also a rich set of boolean query operators. Omega is a Web search application built upon the Xapian library. It can index a Web server's document tree (including HTML, PDF, OpenOffice, MS Word/Excel/Powerpoint/Works, WordPerfect, RTF, PS, etc.), or data exported from arbitrary sources (e.g. SQL databases).
Redet is a tool for developing and executing regular expressions using any of more than 50 search programs, editors, and programming languages, intended both for developing regular expressions for use elsewhere and as a search tool in its own right. For each program in each locale, a palette showing the available constructs is provided. The properties of each program are determined by runtime tests, which guarantees that they will be correct for the program version and locale. Additional features include persistent history, extensive help, a variety of character entry tools, and the ability to change locale while running. Redet is highly configurable and fully supports Unicode.
Msort sorts files in sophisticated ways. Records may be fixed size, newline-separated blocks, or terminated by any specified character. Key fields may be selected by position, tag, or character range. For each key, distinct exclusions, multigraphs, substitutions, and a sort order may be defined or locale collation rules used. Comparisons may be lexicographic, numeric, numeric string, hybrid, random, by string length, angle, domain name, date, time, month name, or ISO8601 timestamp. Keys may be reversed so as to generate reverse dictionaries. Optional keys are supported. Unicode is supported, including full case-folding. Msort itself has a somewhat complex command line interface, but may be driven by an optional GUI.
uni2ascii and ascii2uni provide conversion in both directions between UTF-8 Unicode and more than thirty 7-bit ASCII equivalents, including RFC 2396 URI format and RFC 2045 Quoted Printable format, the representations used in HTML, SGML, XML, OOXML, the Unicode standard, Rich Text Format, POSIX portable charmaps, POSIX locale specifications, and Apache log files. It can also convert between the escapes used for Unicode in languages such as Ada, C, Common Lisp, Java, Pascal, Perl, Postscript, Python, Scheme, and Tcl.
MIB Smithy SDK is a dynamic extension to Tcl/Tk (8.4+) that allows development of custom scripts for controlling SNMP agents, manipulating SMI definitions, doing conversions, and more. It is based on the core of Muonics' MIB Smithy, and the SDK supports SMIv1 and SMIv2, as well as SNMPv1/v2c/v3 with HMAC-SHA-96 and HMAC-MD5-96 authentication and DES/CBC and AES128/CFB privacy. It also provides complete read-write access to all elements of SMI/MIB Module definitions, unlike similar extensions that provide only read access to a limited subset. The SDK allows multiple discrete SMI databases and SNMP sessions, and provides all of the built-in validation and error recovery capabilites of the full product, without the visual MIB development environment.
libuninum is a library for converting Unicode strings to integers and integers to Unicode strings. Internal computation is done using arbitrary precision arithmetic, so there is no limit on the size of the integer that can be converted. Values are passed and returned as ASCII decimal strings, GNU MP mpz_t objects, or unsigned long integers. Auto-detection of the number system is provided. Very many number systems are supported. Group delimitation for output strings is fully controllable. Command line and graphical interfaces are also provided.
WordNet® is an on-line lexical reference system whose design is inspired by current psycholinguistic theories of human lexical memory. English nouns, verbs, adjectives and adverbs are organized into synonym sets, each representing one underlying lexical concept. Different relations link the synonym sets.
Xlit converts text from one writing system into another. It allows the user to define a transliteration simply by typing the input strings in one window and the strings to which they are to be mapped in another. Transliteration may be restricted to regions bounded by specified delimiters or their complements. Transliteration may also be performed by external commands or plugins. Xlit can also convert one type of delimiter to another, e.g. from HZ escapes to XML. Xlit can read and write transliteration definitions in its own format and as Yudit keymaps. It can be run in batch mode without the GUI.
WordGenerator generates hypothetical words from specifications of their syllable structure. The user specifies the maximum length of the words in syllables, the abstract structure of syllables in the language (in terms of such units as consonants and vowels or onsets and rhymes), and the actual sounds that comprise each abstract class (e.g. the list of vowels in the language); WordGenerator then generates the words that conform to this specification. Such lists are useful to field linguists exploring the vocabulary of a language, and to designers of artificial languages.