Nutch is highly scalable Web searching software which builds on top of Apache Hadoop and Lucene Java. Key features include a Web crawler, indexer, crawl management tools, parsers for HTML, PDF, DOC, and several other document formats, and an expandable architecture that allows you to plug in additional functionality such as document parsers, custom scoring algorithms, custom content parsers, protocols, and more.
|Tags||Internet Web Indexing/Search|
|Operating Systems||Unix Windows Windows Cygwin|
Release Notes: This version contains a number of bugfixes and improvements such as Solr Integration, a new indexing framework, and a new scoring framework.
Release Notes: This release includes several critical bugfixes, as well as key speedups.
Release Notes: A thread blocking issue that negatively impacted crawling performance has been fixed. Bugs in scoring have been fixed. Problems with updatedb on Windows/Cygwin have been fixed. A bug in the generator where the lowest scoring pages were selected instead of highest scoring pages has been fixed.
No changes have been submitted for this release.