XML parser class is a PHP class that parses arbitrary XML input and builds an array with the structure of all tag and data elements. Optionally it can keep track of the positions of each element to locate elements that may be contextually in error. Supports a parsed file cache to minimize the overhead of parsing the same file repeatedly. Optimized parsing of simplified XML (SML) formats ignoring the tag attributes.
Serene is a validation engine that implements the JAXP 1.3 Validation Framework API for RELAX NG based on an algorithm centered on providing good messages and having a clear handling of ambiguity and conflicts. It has an implementation of the JAXP Validation Framework API for ISO Schematron and support for Schematron markup embedded in RELAX NG schemas.
SILVERCODERS DocStorage is a utility to improve document management. You can have one database for all invoices, guarantees, protocols, and other documents. DocStorage can extract plain text from documents in doc, XLS, PPT, PDF, RTF, ODT, ODS, ODP, docx, XLSX, PPTX, and many other formats. It can use an OCR engine to extract plain text even from scanned documents. It can perform global fulltext search in all documents regardless of format. It supports document versioning, document duplicate detection, document notes, and document signing. It provides full integration with software suites like Microsoft Office and OpenOffice.
unoconv converts between any document formats that LibreOffice understands. It uses LibreOffice's UNO bindings for non-interactive conversion of documents. Supported document formats include Open Document Format (.odt), MS Word (.doc), MS Office Open/MS OOXML (.xml), Portable Document Format (.pdf), HTML, XHTML, RTF, Docbook (.xml), and more, but image, spreadsheet and presentation formats are also supported.
Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of efficient XML processing technologies centered around a non-extractive XML parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as either an XML parser, a native XML indexer or a file format that uses binary data to enhance the text XML, an incremental XML content modifier, an XML slicer/splitter/assembler, or an XML editor/eraser.
Aephea is a text-based authoring tool for HTML. It enforces well-formedness with a simpler and stricter TeX-like syntax and provides useful extensions and abstractions with facilities for adding new ones. It emphasizes a single unified approach that stays close to HTML itself and promotes and utilizes CSS extensively. Abstractions such as dictionary stacks, arithmetic, and iteration are part of Aephea.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.