6 projects tagged "spider"

Download Website Updated 31 May 2010 ItSucks

Screenshot
Pop 84.68
Vit 4.57

ItSucks is a Web spider with the ability to download (and resume) files. It is also highly customizable with regular expressions and download templates. All backend functionality is also available in a separate library.

No download No website Updated 25 Jun 2009 Methanol

Screenshot
Pop 30.10
Vit 1.01

Methanol is a modular, customizable Web crawling system with crawlers optimized for speed. It is designed to allow the administrator to set up any kind of filetype handling, parsing, and indexing rules.

No download No website Updated 07 Dec 2009 adv-samba

Screenshot
Pop 22.45
Vit 1.00

adv-samba is a PHP class to batch audit SAMBA resources on remote hosts or large LANs. It's a very handy tool during network audits. For example, imagine a LAN with 500 workstations. You want to find any illegal MP3s on company machines. With this tool, you recursively dump the share directory structure. It works with Active Directory authentication too.

No download Website Updated 11 Jun 2010 Ex-Crawler

Screenshot
Pop 23.87
Vit 1.03

The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.

Download No website Updated 18 Nov 2012 SpiderBot

Screenshot
Pop 16.97
Vit 23.64

SpiderBot crawls the Web, retrieves content, and performs actions on the content. It is an effort to design and develop a truly pipelined distributed Web crawler.

Download No website Updated 22 Apr 2014 webStraktor

Screenshot
Pop 128.65
Vit 1.00

webStraktor is a programmable World Wide Web data extraction client. It features a scripting language to facilitate the collection, extraction, and storage of information available on the Web, including images. The scripting language uses elements of regular expression and XPath syntax. The standard webStraktor output format is XML based, either in ASCII, UTF-8, or ISO-8859-1 (Latin1). It adheres to the Robots Exclusion Protocol and can be configured to operate anonymously by connecting through proxy servers. Exhaustive logging and tracing information are provided.

Screenshot

Project Spotlight

HoudahGeo

A one-stop photo geocoding application.

Screenshot

Project Spotlight

SILC

Secure Internet Live Conferencing.