With LinkChecker, you can check HTML documents and Web sites for broken links. It features recursion, robots.txt exclusion protocol support, HTTP proxy support, i18n support, multithreading, regular expression filtering rules for links, and user/password checking for authorized pages. Output can be colored or normal text, HTML, SQL, CSV, or a sitemap graph in DOT, GML, or XML format. Supported link types are HTTP/1.1 and 1.0, HTTPS, FTP, mailto:, news:, nntp:, Telnet, and local files.
GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.
RabbIt is a mutating, caching Web proxy used to speed up surfing over slow links like modems. It does this by removing advertising and background images and scaling down images to low quality JPEGs. RabbIT is written in Java and should be able to run on any platform. It does depend upon an image converter if image scaling is on. The recommended image converter is "convert" from the ImageMagick package.
PHPX is a Web portal system, blog, Content Management System (CMS), forum, and more. It is designed to allow everyone to be able to have feature-rich, interactive websites even if you do not know a bit of programming. Some key features include fully-integrated forums, downloads, an image gallery with slideshow and auto-thumbnailing, support ticket system, a GUI interface for Web page content management, news with topics and instances, and a whole lot more. It allows you to fully customize the look of your site.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
Hammerhead is a stress testing tool for Web sites. It initiates connections from multiple IP aliases and simulates a user from each alias. It is fully configurable, and there are numerous other options for creating problems with a site. Extensive data collection is also available.
webcheck is a Web site checking tool for Web masters. It crawls a given Web site and generates a number of reports. The whole system is pluggable, allowing extra reports and checks to be added easily. It supports retrieving Web sites over HTTP, file, and FTP protocols and produces reports on site structure, broken links, old Web pages, overviews of external links, and more. The links that webcheck considers external are configurable through regular expressions, and webcheck honors robots.txt.