Xidel is a command line tool to download Web pages and extract data from them. It can download files over HTTP/S connections, follow redirections, links, or extracted values, and process local files. The data can be extracted using XPath 2.0, XQuery 1.0, and JSONiq expressions, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain text/XML/HTML/JSON, or assigned to variables to be used in other extract expressions or be exported to the shell. There is also an online CGI service for testing.
XHTML indent takes an XHTML file via standard input and outputs an indented version of the XML. It also adds comments to the end of closing tags so that you can quickly pick up on the opening tag without having to jump to the appropriate line. This does not convert bad code like HTML tidy does. It works with any XML formatted file, but has features designed for XHTML developers. This program works well as both a standalone tool or an external filter for a text editor.
Steev's HTML Parser is an HTML parsing library that builds a complete hierarchy for each element and attribute in the supplied HTML file. Each element is its own C++ class, replete with child nodes, allowing for full control and processing. An 'HTML beautifier' example is included.
xhtml2pdf converts HTML/XHTML/XHML to PDF using the ReportLab Toolkit, the HTML5lib, and pyPdf. It supports HTML 5 and CSS 2.1 (and some of CSS 3). The main benefit of this tool that a user with Web skills like HTML and CSS is able to generate PDF templates very quickly without learning new technologies.
AldoContent is a lightweight CMS focused on usability and simplicity. It is designed for multilingual Web sites and can have sections with public and private content. It has file management, is extensible with plugins, and uses gettext for localization. Is written in PHP for a MySQL backend.
SimplyBibTeX is an application to share and hold BibTeX bibliographies. One can add, edit and remove entries from bibliographies online or upload complete collections. The system uses a very simple but effective template mechanism. Users and coworkers can subscribe to RSS 2.0 or Atom feeds in order to get notified about changes.