All releases of HTML Parser


Release Notes: the license has been changed to the CPL. Maven2 is now used as the build environment. Subversion is used for the source repository. A new Web site was created. <<tag> is now correctly parsed as text. A method to render the start of a tag in HTML was added. CssSelectorNodeFilter does not accept [attr|=val].


Release Notes: Support was added for commonly requested composite tags. Several enhancements were made to the filtering functionality. Additions were made to the HTTP connection processing subsystem. Other user-requested features and bugfixes were made.


Release Notes: This is the first candidate for the final 1.6 release. All outstanding bugs have been fixed. A new XorFilter rounds out the logical node filters.


Release Notes: NodeTreeWalker, a utility class to traverse a tree of Node objects using either depth-first or breadth-first tree order, has been added. Several other bugfixes and patches have been incorporated.


Release Notes: Support has been added for commonly requested composite tags, P, H1-H6, and definition list tags (DL, DT, DD). The node interface has been augmented with get first/last child and get previous/next sibling methods to ease traversing the HTML document.


Release Notes: A minor update that applies a patch to fix bug #1227213 ("Particular SCRIPT tags close too late"), adds changes to FilterBean, and adds a remove(Node) method to the NodeList class.


Release Notes: Significant new APIs have been added since 1.4 was released, such as ConnectionManager, SAX parsing, new filters, and new interfaces. Most notably, a new FilterBuilder allows you to interactively generate a Java class that extracts information from a Web page.


Release Notes: This is a bugfix release that should be considered the first candidate for a version 1.5 final. This release addresses a partial parse issue for pages that contain characters that cannot be represented in the page encoding. Other bugfixes include two null pointer exception fixes, one when an encoding change exception is handled by the StringBean, and another when a cookie with no expiry date is encountered when cookie handling is enabled.


Release Notes: This is a bugfix release that specifically addresses a long- standing script and style parsing problem. The Lexer class now adheres to appendix B.3.2, "Specifying non-HTML data", of the HTML specification regarding recognizing the ETAGO (</) at the end of script and style CDATA. Other bugs addressed include wrapping InputStreams to get around mark()/reset() issues, providing a better error message while a Java bug regarding Byte Order Marks is pending, and implementing a change to handle null ContentType.


Release Notes: This long overdue release adds two main enhancements: ConnectionManager and FilterBuilder. The ConnectionManager is part of the org.htmlparser.http package that handles proxies, passwords and cookies. The FilterBuilder is a GUI application to assist programmers in constructing filters. Filters are a great tool for dealing with the "I just need this little piece of information from this Web page" use-case. The FilterBuilder creates Java source code for inclusion in other programs and allows interactive testing and refining of filters.
A lightweight, multi-purpose library of recommender system algorithms.