RSS 2 projects tagged "Crawl"

Download Website Updated 02 May 2013 OpenSearchServer

Screenshot
Pop 950.05
Vit 56.62

OpenSearchServer is a stable, high-performance search engine and a suite of high-powered full text search algorithms. Documents can be indexed in sixteen languages. Multi-lingual analyzers slice sentences into words, then run lemmatisation algorithms on words based on the document's language. Numerous document formats are supported, such as XML, HTML/XHTML, PDF, Word, PowerPoint, RTF, OpenOffice, plain text, MP3/4, Ogg, FLAC, etc. The Web interface, built around the Zkoss framework, provides an easy way to manage OSS. The integration is fast using the PHP client or the API (XML over HTTP). The crawlers of OpenSearchServer go through Web sites, file systems, and databases to rapidly and easily build your index.

Download Website Updated 13 Jul 2011 Niocchi

Screenshot
Pop 32.43
Vit 36.16

Niocchi is a Java crawler library implementing synchronous I/O multiplexing. This specific type of implementation allows crawling tens of thousands of hosts in parallel on a single low end server. Niocchi has been designed for big search engines that need to crawl massive amount of data, but can also be used to write no-frills crawlers.

Screenshot

Project Spotlight

License4J

A Java library and applications for software licensing.

Screenshot

Project Spotlight

tvpvrd

An analogue TV video recorder daemon, a.k.a a digital VCR.