CCTE is a small utility to provide a way to run automated tests on Web applications. It acts like a Web browser GETing form data to different URLs, and logging output. It has two ways to understand if a CGI did OK: HTTP status, and user-definable strings for both (success and error).
Checklinks is yet Another HTML Link Checker (in Perl 5). Features include SSI (.shtml) support, direct file reads where possible, HTML 4.0, HTTP 1.1, and aliases and other server options. Other useful but more common features include regular expressions to restrict the URLs searched and results reported, and a detailed verbose report. Checklinks was written with Apache in mind, and you can feed it your srm.conf file to auto-configure many settings.
Dead Link Check (DLC) is a Perl script designed to find information on validity of HTTP references. The script may use/generate a cache file for avoiding redoing network requests if the user wants to check added entries. The script works by reading entries from a file (or a list of links from the command line) and output results in file(s) (or STDOUT). DLC was created as an extension to Public Bookmark Generator (PBM), but can be used on its own.
GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
HTML-Tree is a Perl program that recursively decends directories, and creates a web-page based graphical map of HTML pages on a webserver. A configuration file provides control over the "root" directory for the map, map page title and header, directories to be excluded, link substitution strings, and map page background image. This mapper may be run as a cron task to provide an up-to-date roadmap of a webserver. It is primarily useful as a web site development and administration tool, since it shows all pages available to web browsers, and can identify where links are needed.
HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.