Projects / CyberNeko HTML Parser

CyberNeko HTML Parser

NekoHTML is a simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables application programmers to use the NekoHTML parser with existing XNI tools without modification or rewriting code.

Tags
Licenses
Implementation

Recent releases

  •  24 Jan 2008 09:54

    Release Notes: A charset regression was fixed.

    •  15 Dec 2007 11:50

      Release Notes: The license was changed to Apache 2.0 and the version number was boosted to reflect the maturity of the project. Project files were reorganized to decouple them from the rest of the CyberNeko Tools for XNI. xercesMinimal.jar and source were updated so that NekoHTML compiles using Xerces-J 2.9.1. The default behavior was changed to not normalize attribute values and a new feature was added to allow users to turn on normalization. The build was modified to target compilation for Java 1.3. Suggested paragraph tag balancing was adjusted and various reported bugs were fixed.

      •  19 Jun 2005 10:23

        Release Notes: A feature to allow a scanner to fix character entity references for Microsoft Windows characters was added. The nekohtmlXni.jar file is no longer built by default. Tag-balancing was changed to allow headers inside of links. Handling of the blockquote tag, a tag-balancing bug for unknown elements, the mapping of the encoding name in meta tags, various namespace binding bugs, and a no-such-method exception when using the augmentations feature with older versions of Xerces2 were fixed.

        •  18 Nov 2004 10:24

          Release Notes: This release added features for stripping CDATA delimiters from script and style tags, made augmentations, bugfixes, and performance enhancements, and fixed some tag balancing issues.

          •  30 Jun 2004 10:52

            Release Notes: This version implements scanning of XML declaration, fixes a script tag scanning bug, and adds version class and manifest entries to query product information.

            Recent comments

            14 Jan 2004 05:21 kreiger

            Re: Apache license or Cyberneko license?


            If anyone is interested, i sent this question to licensing AT gnu DOT org:


            > My special exception uses the wording "the Apache license".
            > Would i have to change this special exception, and what wording would you recommend to allow for "Apache-style" licenses?


            And i got this answer:


            > Something like "any license with terms identical to the % Apache license version 1.1 but for names" ought to do it.

            22 Dec 2003 10:12 andyc2

            Re: Apache license or Cyberneko license?

            > To be more specific, i'm working for a
            > company which is looking to GPL our
            > software. We are using a couple of
            > Apache libraries, which are under the
            > Apache license.
            > Therefore we include a GPL "special
            > exception" which allows linking with
            > software licensed under "The Apache
            > License".
            > Since CyberNeko is under an Apache-style
            > license, but not "The" Apache License,
            > this special exception would not include
            > the CyberNeko License, right?

            Here are a few relevent links for you:

            http://www.gnu.org/philosophy/license-list.html
            http://www.apache.org/foundation/licence-FAQ.html#GPL

            Note: The CyberNeko license is based on the Apache version 1.1 license.

            13 Dec 2003 04:02 kreiger

            Re: Apache license or Cyberneko license?
            To be more specific, i'm working for a company which is looking to GPL our software. We are using a couple of Apache libraries, which are under the Apache license.
            Therefore we include a GPL "special exception" which allows linking with software licensed under "The Apache License".
            Since CyberNeko is under an Apache-style license, but not "The" Apache License, this special exception would not include the CyberNeko License, right?

            13 Dec 2003 01:16 andyc2

            Re: Apache license or Cyberneko license?

            > Which is it? The Apache license or the
            > Cyberneko license? My GPL project with
            > special exception for the Apache license
            > can't use the Cyberneko license, right?

            The CyberNeko license is an Apache-style license. In other words, the wording is exactly the same but the project is not associated with the Apache Software Foundation. So you can use NekoHTML with the same freedom that you use Apache-based software.

            12 Dec 2003 16:42 kreiger

            Apache license or Cyberneko license?
            Which is it? The Apache license or the Cyberneko license? My GPL project with special exception for the Apache license can't use the Cyberneko license, right?

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.