jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.
Piglet is a tool for parsing and lexing text for the .NET framework. The purpose of Piglet is to provide an easy-to-use tool for parsing text which can be easily included in any .NET project as a single assembly. In contrast to most parser generators, Piglet provides a fluent interface which enables you to express your grammar in a syntax which is accessible for users with no prior experience of parser generators. Piglet generates efficient, type safe, and reentrant LALR(1) parsers at runtime, which saves you from having a pre-compile step to generate your parsing tables. It also includes a lexical scanner generator which can be used independently of the parser generator.