Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
Apache Rivet is a system for creating dynamic Web content via a programming language integrated with Apache Web Server. It is designed to be fast, powerful and extensible, consume few system resources, be easy to learn, and to provide the user with a platform that can also be used for other programming tasks outside the web (GUIs, system administration tasks, text processing, database manipulation, XML, etc.). It is similar to PHP, except that it uses Tcl, and provides both HTML/Tcl pages as well as pure Tcl pages to aid the programmer in separating logic and presentation when necessary.
Solr is an enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, and rich document (e.g. Word and PDF) handling. Solr is highly scalable, providing distributed search and index replication, and it powers the search and navigation features of many of the world's largest internet sites. Solr is written in Java and runs as a standalone full-text search server within a servlet container such as Tomcat. Solr uses the Lucene Java search library at its core for full-text indexing and search, and has REST-like HTTP/XML and JSON APIs that make it easy to use from virtually any programming language. Solr's powerful external configuration allows it to be tailored to almost any type of application without Java coding, and it has an extensive plugin architecture when more advanced customization is required.
Synapse is an ESB engine and XML router built completely on open standards. It is a mediation framework for XML messages and Web services that allows messages flowing through, into, or out of an organization to be mediated, including aspects such as logging, service lookup, performance mediation, versioning, failover, monitoring, fault management, and tracing.
ApacheDS is an LDAP and X.500 experimentation platform. Its backend subsystem and frontend are separable and independently embeddable. It provides a server side JNDI LDAP provider that directly interacts with the backend storage. It is powered by SEDA (Staged Event-Driven Architecture), which can handle large amounts of concurrency.
Aranea is a hierarchical Model-View-Controller Web framework that provides a common, simple approach to building Web application components, reusing custom or general GUI logic, and extending the framework. The framework enforces programming using object oriented techniques with POJOs and provides a JSP tag library that facilitates the programming of Web GUIs without writing HTML. In addition to being a full-fledged Web framework in its own right, it provides a powerful and simple component system that allows the framework to be tailored by configuring the reusable modules and adding modules only for the missing features.