RSS 1 project tagged "Apache 2.0"

Download Website Updated 29 Mar 2009 Apache Nutch

Screenshot
Pop 49.75
Vit 2.56

Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.

Screenshot

Project Spotlight

Ingo

A mail filtering manager, supporting Sieve, procmail, maildrop and IMAP filters.

Screenshot

Project Spotlight

pgBadger

A tool that parses PostgreSQL log files and generates fully detailed reports with charts.