ItSucks is a Web spider with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionality is also available in a separate library.
mycelium is an information retrieval system. It aspires to be an alternative to Nutch / Lucene. It uses MongoDB as a storage engine.
A programmable packet sniffer.
Java-based object oriented querying.