jsoup is a Java library for working with real-world HTML. It can parse HTML from a URL, file, or string. It can find and extract data, using DOM traversal or CSS selectors. The HTML elements, attributes, and text can be manipulated. It can clean user-submitted content against a safe white-list. jsoup is designed to deal with all varieties of HTML found in the wild, from pristine and validating to invalid tag-soup; jsoup will create a sensible parse tree.
Zynaptic Reaction is a flexible asynchronous programming framework for Java which may be used to implement complex event-driven applications. It is heavily influenced by the Twisted programming framework developed by TwistedMatrix Labs for the Python programming language. The focus of the Reaction library is on the concurrency and callback model and as such it is application neutral. It can be used to manage lots of concurrent I/O or to farm out compute intensive tasks to multicore processors. As well as being usable as a basic Java library, Reaction can also run as an independent OSGi service and integrate into any GUI framework you choose.
Groowiki is a Wiki program and a document management system together. It is a wiki program that utilizes Subversion, Groovy, Velocity, and many more existing products. It lets you edit Wiki pages in a tree structure just like any other wiki, but it also gives you SVN access that makes it very easy to add files to the content. It stores everything in SVN. This way all information is versioned and (optionally) accessible offline, and you can upload your modifications in batches. This is especially useful if you work with large files and the Wiki pages mainly summarize the contents of the documents.
phoneutria is a Web crawler that is multi-threaded, scalable, high performance, extensible, and polite. It can be used to crawl, index, load-test, or even download any Web or enterprise domain and is configurable through a XML configuration file. Phoneutria can be used for either checking the links of a Web site or for load-testing purposes (i.e. the level of politeness can be configured). It provides a plug-in mechanism for further extensions.
BorderFlow implements a general-purpose graph clustering algorithm. It maximizes the inner to outer flow ratio from the border of each cluster to the rest of the graph. The main advantage of the algorithm is that it does not need parametrization to compute results of high accuracy.
terp is a modular template engine that integrates tightly into ANT and provides a portable C++ compiler task (aCC, g++, icc, msvc++, SUN CC, xlC) on many platforms and processor architectures, a collection iterator task, a full-featured expression language with host introspection, formatters, selectors, and transformers for expressions, and much more. It can be embedded into Ant, used as a stand-alone or embedded templating engine, or used as a batch or interactive expression evaluator. It is extremely flexible and can be extended with your own types, operators can be overloaded, and properties and methods can be added to types.