Steev's HTML Parser is an HTML parsing library that builds a complete hierarchy for each element and attribute in the supplied HTML file. Each element is its own C++ class, replete with child nodes, allowing for full control and processing. An 'HTML beautifier' example is included.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
xml-test checks that an XML document is included in another document. It is handy when testing an application's output against a document where element order is different (GData and Atom are examples of specifications where element order is unimportant). It has a relaxed notion of containment: element order is ignored, whitespace is trimmed, comments are ignored, specific elements can be ignored by passing XPath-like paths on the command line, and text nodes (element and attribute content) can be ignored by passing '-notext' on the command line.