Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Apache PhotArk is a photo gallery application including a content repository for the images, a display piece, an access control layer, and upload capabilities. The idea is to have a rigid design for the content repository with a very flexible display piece. The images in the content repository will be protected with granular access control.
Apache XML Graphics Commons is a library that consists of several reusable components used by Apache Batik and Apache FOP. Many of these components can easily be used separately outside the domains of SVG and XSL-FO. You will find components such as a PDF library, an RTF library, Graphics2D implementations that let you generate PDF and PostScript files, and much more.
AtMail is a webmail client. The project aims to provide an elegant client for existing IMAP mailservers, with less bloat and a focus on an intuitive, simple user interface. Features include complete Webmail functionality, address-book support, video mail, an AJAX interface, drag'n'drop, and more.
Citizen Intelligence Agency is a project with the goal to increase the surveillance of Swedish parliament members. This will be done by analyzing the votes of each member of parliament and creating views related to the relations between them. It uses Maven, MySQL, JPA2, Spring 3.x, and Vaadin.
Citrus is a test framework written in Java that enables automated integration testing of message-based enterprise SOA applications. The tool can easily simulate surrounding systems across various transports and protocols (e.g. JMS, SOAP WebServices, HTTP, TCP/IP, etc.) in order to perform end-to-end use case testing. Citrus provides strong validation mechanisms for XML message contents and allows you to build complex testing logic such as sending and receiving messages, database validation, automatic retries, variable definitions, dynamic message contents, error simulation, and many more.
District Builder is a software application designed to give the public transparent, accessible, and easy-to-use mapping tools to draw the boundaries of their communities or to generate redistricting plans for their state and localities. The drawing of electoral districts is among the least transparent processes in democratic governance. All too often, redistricting authorities maintain power by obstructing public participation. The resulting districts embody the goals of politicians to the detriment of the representational interests of communities and the public at large. With District Builder, the public has the capacity to create and submit district plans for municipal, county, and state governments, support redistricting competitions, and keep the entire process open. In addition to legislative redistricting, District Builder's flexibility accommodates school districts, police districts, and many other redistricting needs.