Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Apache PhotArk is a photo gallery application including a content repository for the images, a display piece, an access control layer, and upload capabilities. The idea is to have a rigid design for the content repository with a very flexible display piece. The images in the content repository will be protected with granular access control.
Apache XML Graphics Commons is a library that consists of several reusable components used by Apache Batik and Apache FOP. Many of these components can easily be used separately outside the domains of SVG and XSL-FO. You will find components such as a PDF library, an RTF library, Graphics2D implementations that let you generate PDF and PostScript files, and much more.
AtMail is a webmail client. The project aims to provide an elegant client for existing IMAP mailservers, with less bloat and a focus on an intuitive, simple user interface. Features include complete Webmail functionality, address-book support, video mail, an AJAX interface, drag'n'drop, and more.
BitNami Drupal stack is an easy-to-install distribution of the Drupal CMS software. It includes pre-configured, ready-to-run versions of Apache, MySQL, and PHP so users can get Drupal installed and up and running in minutes after answering a few questions. Currently, Linux and Windows are supported.
Citizen Intelligence Agency is a project with the goal to increase the surveillance of Swedish parliament members. This will be done by analyzing the votes of each member of parliament and creating views related to the relations between them. It uses Maven, MySQL, JPA2, Spring 3.x, and Vaadin.
Citrus is a test framework written in Java that enables automated integration testing of message-based enterprise SOA applications. The tool can easily simulate surrounding systems across various transports and protocols (e.g. JMS, SOAP WebServices, HTTP, TCP/IP, etc.) in order to perform end-to-end use case testing. Citrus provides strong validation mechanisms for XML message contents and allows you to build complex testing logic such as sending and receiving messages, database validation, automatic retries, variable definitions, dynamic message contents, error simulation, and many more.
Daisy is an enterprise content management solution, bridging the gap between classic Web site content management and the Wiki style of information management and discovery. It is ideally suited for intranet knowledge bases, product and/or project documentation, and management of content-rich Web sites. It consists of a repository server with powerful querying and versioning capabilities, and a Wiki-like front-end Web user interface with in-browser rich-text authoring.