RSS 2 projects tagged "Parser"

Download Website Updated 12 Feb 2013 HtmlCleaner

Screenshot
Pop 16.73
Vit 20.72

HtmlCleaner is an HTML parser. HTML found on the Web is usually dirty, ill-formed, and unsuitable for further processing. For any serious consumption of such documents, it is necessary to first clean up the mess and bring order to the tags, attributes, and ordinary text. For a given HTML document, HtmlCleaner reorders individual elements and produces well-formed XML. By default, it follows rules similar to those which most Web browsers use to create a Document Object Model. However, the user may provide custom tag and rule sets for tag filtering and balancing.

Download Website Updated 11 Nov 2013 jsoup

Screenshot
Pop 189.07
Vit 14.30

jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.

Screenshot

Project Spotlight

QPwmc

A graphical pwmd client.

Screenshot

Project Spotlight

JBIG-KIT

A portable C implementation of the JBIG1 standard.