7 projects tagged "crawler"

Download Website Updated 31 May 2010 ItSucks

Screenshot
Pop 84.23
Vit 4.57

ItSucks is a Web spider with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionality is also available in a separate library.

Download Website Updated 06 Apr 2014 OpenSearchServer

Screenshot
Pop 558.82
Vit 40.89

OpenSearchServer is a powerful, enterprise-class, search engine program. Using its Web user interface, crawlers (Web, file, database, etc.), and REST/RESTFul API, you can integrate advanced full-text search capabilities into your application.

No download Website Updated 11 Jun 2010 Ex-Crawler

Screenshot
Pop 23.96
Vit 1.03

The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.

No download No website Updated 21 Nov 2010 skipfish

Screenshot
Pop 62.55
Vit 1.54

skipfish is a high-performance, easy, and sophisticated Web application security testing tool. It features a single-threaded multiplexing HTTP stack, heuristic detection of obscure Web frameworks, and advanced, differential security checks capable of detecting blind injection vulnerabilities, stored XSS, and so forth.

Download Website Updated 30 Dec 2010 Ebot

Screenshot
Pop 97.95
Vit 3.27

Ebot is a scalable and distribuited Web crawler. The URLs are saved to a NOSQL database (which supports map/reduce queries) that you can query via RESTful HTTP requests or using your preferred programming languages. The URLs that need to be analyzed are sent to AMQP queues. In this way, it is possible to run several crawlers in parallel and stop and start them without losing URLs.

No download No website Updated 02 Apr 2012 mycelium

Screenshot
Pop 15.13
Vit 28.03

mycelium is an information retrieval system. It aspires to be an alternative to Nutch / Lucene. It uses MongoDB as a storage engine.

Download No website Updated 22 Apr 2014 webStraktor

Screenshot
Pop 127.86
Vit 1.00

webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.

Screenshot

Project Spotlight

SmartGit/Hg

A GUI client for Git, Mercurial, and SVN.

Screenshot

Project Spotlight

libconfigduo

A C/C++ configuration file manipulation library.