438 projects tagged "Indexing/Search"
Grub-client is a distributed crawling client, used to create an infrastructure that provides URL update status information for Web pages on the Internet. Grub's distributed crawler network will enable Web sites, content providers, and individuals to notify others that changes have occurred in their content, all in real time. Clients are ranked by the numbers of URLs that are crawled, both on their own machines and other servers.
MyHeadlines is module that adds syndicated headline functionality to any PHP and MySQL-based website. Your users may subscribe to multiple RSS feeds from a fully categorized database of over 1,000 sources. It was previously a PHPNuke/PostNuke Addon, but can now be integrated with any Web site.
Crawl starts a depth-first traversal of the Web at the specified URLs. It stores all JPEG images that match the configured constraints. Crawl is fairly fast and allows for graceful termination. After terminating crawl, it is possible to restart it at exactly the same spot where it was terminated. It also keeps a persistent database that allows multiple crawls without revisiting sites.
Historical Event Markup and Linking Project (Heml) provides an XML schema for historical events and a Java Web app which transforms conforming documents into hyperlinked timelines, maps and tables. It aims to provide a most information-rich interchange format for historical data, and thus add a historical component to the growing movement for a 'Semantic Web.'
Curator is a powerful script that allows one to generate Web page image galleries with the intent of displaying photographic images on the Web, or for a CD-ROM presentation and archiving. It generates static Web pages only - no special configuration or running scripts are required on the server. The script supports many file formats, hierarchical directories, thumbnail generation and update, per-image description file with many fields, and 'tracks' of images spanning multiple directories. The templates consist of HTML with embedded Python. Running this script only requires a recent Python interpreter and the ImageMagick tools.