Duke is a fast and flexible record linkage engine. It does not use the traditional blocking (sort by key) approach, but instead relies on Lucene. This makes it high-performance (able to process 1,000,000 records in ~10 minutes). Duke can be run from the command line, but also has an API allowing incremental linking applications to be built easily. It supports reading data from CSV, JDBC, SPARQL, and NTriples, and also supports a number of string comparators and string normalizers.
Neddick is part of the Fogcutter suite of tools for building intelligent applications. Neddick provides tools for tagging, ranking, discussing, and discovering various sources of knowledge: Web-links, documents, people, etc. It's over-simplifying a little bit, but think of Neddick as sort of a combination of Reddit, Delicious, and Planet. It also includes a powerful search engine and a recommendations engine, and it lets you categorize, rate, tag, filter, discuss, and discover knowledge in ways that most enterprise search applications can't.
Querydsl is a framework that enables the construction of statically typed SQL-like queries. Instead of writing queries as inline strings or externalizing them into XML files, they can be constructed via a fluentDSL/API like Querydsl. It supports JPA, JDO, Java Collections, SQL via JDBC, Lucene, and Hibernate Search.
Puggle is a desktop search engine that provides full text search over files, folders, music, photos, Web pages, and other data that are stored locally on your computer. Puggle is able to create many different indices, each with a different configuration. For example, you may have a different index for your music collection as well as your documents. Each of them can be used on demand, simply by loading it. Furthermore, Puggle supports indexing of portable devices, like USB flash drives or external hard disks. The index will be stored in the device, using relative paths, allowing you to search over the data very quickly on any computer.