Projects / jsoup

jsoup

jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.

Tags
Licenses
Operating Systems
Implementation
Translations

RSS Recent releases

  •  29 May 2012 01:03

Release Notes: This release adds a number of improvements and bugfixes, including renewed support for the Google App Engine and parsing fixes.

  •  28 Mar 2012 16:37

Release Notes: This release adds many improvements, including a relaxed XML parser, a lighter memory footprint, and a range of bugfixes.

  •  02 Jul 2011 09:30

Release Notes: This release included a new HTML5 compliant parser and fixes for Java 1.5 and Android 2.2 compatibility.

  •  13 Jun 2011 10:19

Release Notes: This version of jsoup includes a brand new HTML5-conformant parser, which ensures HTML is parsed just as modern browsers do. It also improves parse time and lowers memory usage, and adds new convenience methods including Element.unwrap() and Node.after() and Node.before().

  •  27 Feb 2011 08:34

Release Notes: This release primarily corrects a regression bug where the content-type of a document retrieved using Jsoup.connect(String url) may not be correctly detected if specified in a meta tag.

Screenshot

Project Spotlight

Liberté Linux

A Linux live distribution intended as a communication aid in hostile environments.

Screenshot

Project Spotlight

Cainteoir Text-to-Speech

A GNOME/GTK+ GUI for the Cainteoir Text-to-Speech Engine.