JWPL is a language independent, database-driven, high performance Wikipedia API that provides structured access to information nuggets like redirects, categories, articles, and link structure. It contains a Mediawiki Markup parser that can be used to further analyze the contents of a Wikipedia page or standalone with other text, TimeMachine, which reconstructs a snapshot of Wikipedia from a specific date, or multiple snapshots from a time span, and RevisionMachine, which offers efficient access to the history of articles using a dedicated storage format which decreases storage space by 98%. This enables random access to the whole revision history without requiring several terabytes of storage for a single Wikipedia dump.
The Lean Mean C++ Option Parser handles program arguments (argc, argv). It supports the short and long option formats of getopt(), getopt_long(), and getopt_long_only(), but has a more convenient interface. It is a freestanding, header-only library with no dependencies, not even libc or STL. It comes with a usage message formatter which supports column alignment and line wrapping, making it ideal for localized messages with different lengths.
Yap4j is the simplest library for parsing CSV files in Java. It deserializes CSV files into a list of POJOs using a set of Java annotations, while allowing you to specify Object-CSV mappings. It automatically converts to and from a wide range of data types, and includes support for types from popular libraries such as Joda Time, and support for custom record delimiters.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.