stupid-xml is a ridiculously simple annotation-based XML stream parser for Java. The main goal of this project is to get the strings you care about out of XML and into Java as quickly as possible. You define a simple model class, specify the relative paths for its fields, and it will start generating instances for you from an XML stream. The functionality is limited. It will only parse Strings into your model, but this keeps everything extremely simple. Once you have the Strings in your model, you can perform filtering or more complex conversions.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Yap4j is the simplest library for parsing CSV files in Java. It deserializes CSV files into a list of POJOs using a set of Java annotations, while allowing you to specify Object-CSV mappings. It automatically converts to and from a wide range of data types, and includes support for types from popular libraries such as Joda Time, and support for custom record delimiters.
gradle-sablecc-plugin is a gradle plugin which creates parsers using SableCC. SableCC supports automatic CST-to-AST transformation, emits all the visitor patterns and analysis helpers you will likely ever need, and is LR, not LL(k). Many example grammars are available for modern languages; the author of this plugin has written dozens.