RSS 3 projects tagged "crawler"

Download Website Updated 31 May 2010 ItSucks

Screenshot
Pop 91.48
Vit 4.89

ItSucks is a Web spider with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionality is also available in a separate library.

Download Website Updated 02 May 2013 OpenSearchServer

Screenshot
Pop 951.28
Vit 58.24

OpenSearchServer is a stable, high-performance search engine and a suite of high-powered full text search algorithms. Documents can be indexed in sixteen languages. Multi-lingual analyzers slice sentences into words, then run lemmatisation algorithms on words based on the document's language. Numerous document formats are supported, such as XML, HTML/XHTML, PDF, Word, PowerPoint, RTF, OpenOffice, plain text, MP3/4, Ogg, FLAC, etc. The Web interface, built around the Zkoss framework, provides an easy way to manage OSS. The integration is fast using the PHP client or the API (XML over HTTP). The crawlers of OpenSearchServer go through Web sites, file systems, and databases to rapidly and easily build your index.

No download Website Updated 11 Jun 2010 Ex-Crawler

Screenshot
Pop 25.46
Vit 1.04

The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.

Screenshot

Project Spotlight

Pidgeon

An IRC client.

Screenshot

Project Spotlight

Parted Magic

A tool for partitioning and disk management tasks.