Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Apache PhotArk is a photo gallery application including a content repository for the images, a display piece, an access control layer, and upload capabilities. The idea is to have a rigid design for the content repository with a very flexible display piece. The images in the content repository will be protected with granular access control.
Apache Qpid is a messaging broker that implements the latest AMQP specification, providing transaction management, queuing, distribution, security, management, clustering, federation, heterogeneous multi-platform support, and much more. It is extremely fast and aims to be 100% AMQP Compliant.
BSF4ooRexx is a Java language binding for the scripting language ooRexx. It allows ooRexx programmers to directly use the Java Runtime Environment (JRE) libraries. It allows, for example, implementation of Java methods in ooRexx and callbacks from Java to ooRexx. It camouflages Java so that it resembles ooRexx by being dynamically typed and caseless. BSF4ooRexx comes with built-in support for programming OpenOffice.org/LibreOffice.org and allows ooRexx to be used as a macro language.