RSS 5 projects tagged "crawler"

Download Website Updated 31 May 2010 ItSucks

Screenshot
Pop 87.21
Vit 4.89

ItSucks is a Web spider with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionality is also available in a separate library.

Download Website Updated 02 May 2013 OpenSearchServer

Screenshot
Pop 923.65
Vit 52.42

OpenSearchServer is a stable, high-performance search engine and a suite of high-powered full text search algorithms. Documents can be indexed in sixteen languages. Multi-lingual analyzers slice sentences into words, then run lemmatisation algorithms on words based on the document's language. Numerous document formats are supported, such as XML, HTML/XHTML, PDF, Word, PowerPoint, RTF, OpenOffice, plain text, MP3/4, Ogg, FLAC, etc. The Web interface, built around the Zkoss framework, provides an easy way to manage OSS. The integration is fast using the PHP client or the API (XML over HTTP). The crawlers of OpenSearchServer go through Web sites, file systems, and databases to rapidly and easily build your index.

No download No website Updated 21 Nov 2010 skipfish

Screenshot
Pop 74.94
Vit 1.59

skipfish is a high-performance, easy, and sophisticated Web application security testing tool. It features a single-threaded multiplexing HTTP stack, heuristic detection of obscure Web frameworks, and advanced, differential security checks capable of detecting blind injection vulnerabilities, stored XSS, and so forth.

Download Website Updated 30 Dec 2010 Ebot

Screenshot
Pop 106.23
Vit 3.38

Ebot is a scalable and distribuited Web crawler. The URLs are saved to a NOSQL database (which supports map/reduce queries) that you can query via RESTful HTTP requests or using your preferred programming languages. The URLs that need to be analyzed are sent to AMQP queues. In this way, it is possible to run several crawlers in parallel and stop and start them without losing URLs.

No download No website Updated 02 Apr 2012 mycelium

Screenshot
Pop 19.67
Vit 20.38

mycelium is an information retrieval system. It aspires to be an alternative to Nutch / Lucene. It uses MongoDB as a storage engine.

Screenshot

Project Spotlight

Excelsior JET

A Java SE 6 compliant JVM with AOT compiler and deployment toolkit.

Screenshot

Project Spotlight

JSXGraph

A cross-browser library for plotting and interactive geometry.