The Ex-Crawler Project is divided into three subprojects. The main part is the Ex-Crawler daemon server, a highly configurable and flexible Web crawler written in Java. It comes with its own socket server, with which you can manage the server, users, distributed grid/volunteer computing, and much more. Crawled information is stored in a database (Currently MySQL, PostgreSQL, and MSSQL are supported). The second part is a graphical (Java Swing) distributed grid/volunteer computing client, including user computer state detection, based on JADIF Project. The Web search engine is written in PHP. It comes with a Content Management System, user language detection and multi-language support, and templates using Smarty, including an application framework that is partly forked from Joomla 1.5, so that Joomla components can be adapted quickly.
|Tags||crawler spider searchengine webcrawler Engine Distributed Computing search engine search|
|Operating Systems||Linux OS X platform independent (Java) Solaris Windows|
|Implementation||Java 6 MySQL HTML jsoup MS SQL postgresql PHP jadif AJAX|
|Translations||English German Greek|
Release Notes: This release features a complete database rework, many speed improvements (up to 60% faster), PDF crawling, language detection, an URL filter, and hundreds of other improvements, bugfixes, and updates. Ex-Crawler can now be run as a daemon. Startup scripts and a process watcher were included. Setup was simplified. A utility that creates the required database tables was added and an automatic performance benchmark test was implemented so that you don't need to handle the number of threads manually.