Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Apache OpenNLP is a machine learning based toolkit for the processing of natural language text. It supports the most common NLP tasks, such as tokenization, sentence segmentation, part-of-speech tagging, named entity extraction, chunking, parsing, and coreference resolution. These tasks are usually required to build more advanced text processing services.
Apache PhotArk is a photo gallery application including a content repository for the images, a display piece, an access control layer, and upload capabilities. The idea is to have a rigid design for the content repository with a very flexible display piece. The images in the content repository will be protected with granular access control.
AtMail is a webmail client. The project aims to provide an elegant client for existing IMAP mailservers, with less bloat and a focus on an intuitive, simple user interface. Features include complete Webmail functionality, address-book support, video mail, an AJAX interface, drag'n'drop, and more.
Citizen Intelligence Agency is a project with the goal to increase the surveillance of Swedish parliament members. This will be done by analyzing the votes of each member of parliament and creating views related to the relations between them. It uses Maven, MySQL, JPA2, Spring 3.x, and Vaadin.
Citrus is a test framework written in Java that enables automated integration testing of message-based enterprise SOA applications. The tool can easily simulate surrounding systems across various transports and protocols (e.g. JMS, SOAP WebServices, HTTP, TCP/IP, etc.) in order to perform end-to-end use case testing. Citrus provides strong validation mechanisms for XML message contents and allows you to build complex testing logic such as sending and receiving messages, database validation, automatic retries, variable definitions, dynamic message contents, error simulation, and many more.
Daisy is an enterprise content management solution, bridging the gap between classic Web site content management and the Wiki style of information management and discovery. It is ideally suited for intranet knowledge bases, product and/or project documentation, and management of content-rich Web sites. It consists of a repository server with powerful querying and versioning capabilities, and a Wiki-like front-end Web user interface with in-browser rich-text authoring.
District Builder is a software application designed to give the public transparent, accessible, and easy-to-use mapping tools to draw the boundaries of their communities or to generate redistricting plans for their state and localities. The drawing of electoral districts is among the least transparent processes in democratic governance. All too often, redistricting authorities maintain power by obstructing public participation. The resulting districts embody the goals of politicians to the detriment of the representational interests of communities and the public at large. With District Builder, the public has the capacity to create and submit district plans for municipal, county, and state governments, support redistricting competitions, and keep the entire process open. In addition to legislative redistricting, District Builder's flexibility accommodates school districts, police districts, and many other redistricting needs.
Echomine Feridian is an easy-to-use Java API that gives you quick and easy access to the XMPP network used in IM services such as Google Talk. The API allows you to communicate with Jabber/XMPP servers for sending and receiving instant messages, presence management, and custom extensions to the XMPP protocol.
Erudite is an application for training and testing back propogation neural networks using the ANNeML (Artifical Neural Network Markup Language) XML format. It supports testing and training neural nets with CSV files and has support for randomized training sets, optional adapting learning rate, sigmoid or hyperbolic tangent transfer functions, optional bias and weight adjustment locking, and more.