unoconv converts between any document formats that LibreOffice understands. It uses LibreOffice's UNO bindings for non-interactive conversion of documents. Supported document formats include Open Document Format (.odt), MS Word (.doc), MS Office Open/MS OOXML (.xml), Portable Document Format (.pdf), HTML, XHTML, RTF, Docbook (.xml), and more, but image, spreadsheet and presentation formats are also supported.
Virtual Token Descriptor for eXtensible Markup Language (VTD-XML) refers to a collection of efficient XML processing technologies centered around a non-extractive XML parsing technique called Virtual Token Descriptor (VTD). Depending on the perspective, VTD-XML can be viewed as either an XML parser, a native XML indexer or a file format that uses binary data to enhance the text XML, an incremental XML content modifier, an XML slicer/splitter/assembler, or an XML editor/eraser.
Aephea is a text-based authoring tool for HTML. It enforces well-formedness with a simpler and stricter TeX-like syntax and provides useful extensions and abstractions with facilities for adding new ones. It emphasizes a single unified approach that stays close to HTML itself and promotes and utilizes CSS extensively. Abstractions such as dictionary stacks, arithmetic, and iteration are part of Aephea.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
libunibreak is an implementation of the line breaking and word breaking algorithms as described in Unicode Standard Annex 14 and Unicode Standard Annex 29. It is a superset of, and supersedes, liblinebreak. It is designed to be used in a generic text renderer. FBReader is one real-world example.
The Java Wikipedia API (Bliki engine) is a parser library for converting Wikipedia/Mediawiki syntax to HTML. It supports wiki tags for bold, italic, headers, nowiki, source, table of contents, tables, lists, categories, footnotes (references), images, syntax highlighting of source code fragments, templates, and template parser functions.