webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.
Fileevent is a rules-based utility that matches files based on simple patterns and macros and performs actions on them. These actions are typically used to transfer or rename the file ready for further processing. This utility is particularly useful for batch processing environments where files to load/process might arrive on an adhoc basis. Fileevent allows them to be transferred elsewhere, retrieved from elsewhere, or renamed.
XMLFoundation provides a foundation for XML support in an application. However, it is more than just another XML parser. It applies a unique approach to handling XML that allows your application code to focus on the application rather than traversing DOM or subscribing to SAX events. The most unique feature of the XMLFoundation is the object oriented encapsulation that provides XML support in the application layer. XMLFoundation allows you to easily integrate XML with your GUI or with your server objects, and it natively supports COM, DCOM, and CORBA objects.
BaseX is a light-weight, high-performance, and scalable XML database system and XPath/XQuery processor, including full support for the W3C Update and Full Text extensions. An interactive and user-friendly GUI frontend gives you great insight into large XML data instances. It is platform independent and works out of the box.
LanguageTool is a style and grammar checker that currently supports English, Polish, German, French, Dutch, and other languages to a different degree. It scans the words and their part-of-speech tags for occurrences of error patterns, which are defined in an XML file. More powerful error rules can be written in Java.
GroupServer is a Web-based mailing list manager designed for large sites. It provides email interaction like a traditional mailing list manager but also supports reading, searching, and posting of messages and files via the Web. Users have forum-style profiles, and can manage their email addresses and other settings using the same Web interface. It has supports features such as Atom feeds, a basic CMS, statistics, multiple verified addresses per user, and bounce detection, and is able to be heavily customized.
Xidel is a command line tool to download Web pages and extract data from them. It can download files over HTTP/S connections, follow redirections, links, or extracted values, and process local files. The data can be extracted using XPath 2.0, XQuery 1.0, and JSONiq expressions, CSS 3 selectors, and custom, pattern-matching templates that are like an annotated version of the processed page. The extracted values can then be exported as plain text/XML/HTML/JSON, or assigned to variables to be used in other extract expressions or be exported to the shell. There is also an online CGI service for testing.