NekoHTML is a simple HTML scanner and tag balancer that enables Java application programmers to parse HTML documents and access the information using standard XML interfaces. The parser can scan HTML files and "fix up" many common mistakes that human (and computer) authors make in writing HTML documents. NekoHTML is written using the Xerces Native Interface (XNI) that is the foundation of the Xerces2 implementation. This enables application programmers to use the NekoHTML parser with existing XNI tools without modification or rewriting code.
XML parser class is a PHP class that parses arbitrary XML input and builds an array with the structure of all tag and data elements. Optionally it can keep track of the positions of each element to locate elements that may be contextually in error. Supports a parsed file cache to minimize the overhead of parsing the same file repeatedly. Optimized parsing of simplified XML (SML) formats ignoring the tag attributes.
Staroffice/Openoffice to HTML Converter converts a Staroffice/Openoffice.org document to HTML, using xsltproc for the XML conversion and ImageMagick's convert to convert images. It creates a table of contents with links, and handles tables, styles, spans, and many other XML elements from a writer file.
Xmldiff is a Python tool that finds the differences between two similar XML files in the same way the diff utility does for text files. A description of the changes found can be displayed using Xmldiff's syntax or as an XUpdate script that can be used to "patch" the original document.
The POI project contains several components for dealing with popular OLE 2 formats in Java. POIFS is a pure Java implementation of the OLE 2 Compound document format. HSSF is a pure Java implementation of Excel 97-2003 XLS file format based on POIFS. HSSF Serializer is a pure Java serializer for Cocoon 2 that uses the Gnumeric XML format to output XLS. Full documentation of the POIFS file format is included. It is useful if you wish to output reports in the Excel file format, or if you have existing XML documents that you need to get into Excel. HSLF provides initial support for PowerPoint 97-2003 and HWPF provides limited support for Word 97-2003. POIFS can be used to read any OLE2 stream.
References for TeX and Friends is an ongoing project which provides a help file for LaTeX (and its friends like ConTeXt, Metapost, Metafont, etc.) using a state-of-the-art source format, DocBook/XML. Various output formats can be generated from the source file. Anyone can write a converter for any desired output format. Because the source file is XML, the easiest way to do this might be to use XSLT.
tracx reads and stores any kind of XML data using a dedicated programming language. Read XML structure can be traced and changed. Unix Shell and JDBC database escapes allows you to retrieve data and store it into the XML structure. Implementations based on C++ and Java are available.