webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.
Ding is a PHP framework that provides dependency injection (by Setter, Constructor, and Method), Aspect Oriented Programming, XML, YAML, Events support, and some JSR 250/330 annotations as bean definition providers, lightweight, can be deployed as a PHAR file, simple, and quick MVC, syslog, TCP client and server with non-blocking sockets, timers, and custom error, signal, and exception handling, PAGI integration (for the Asterisk gateway interface), and PAMI integration (for Asterisk management). It is similar to Java's Seasar and Spring.
Historical Event Markup and Linking Project (Heml) provides an XML schema for historical events and a Java Web app which transforms conforming documents into hyperlinked timelines, maps and tables. It aims to provide a most information-rich interchange format for historical data, and thus add a historical component to the growing movement for a 'Semantic Web.'
XCC is a tool for building XML format parsers. One way to describe what XCC does is by analogy with a generic parser generator, e.g. yacc or bison. Yacc needs a lexical analyzer to function properly, and that lexical analyzer is usually built with (f)lex. In the XML world, there are a few packages which fill in the role of lex (expat and libxml are the most known), but the high-level grammar parsing is usually done by a hand-written code; writing such a parser is a tedious and error-prone task. XCC was created to help developers in writing reliable easy-to-understood parsers for handling complex XML-based grammars.