Checklinks is yet Another HTML Link Checker (in Perl 5). Features include SSI (.shtml) support, direct file reads where possible, HTML 4.0, HTTP 1.1, and aliases and other server options. Other useful but more common features include regular expressions to restrict the URLs searched and results reported, and a detailed verbose report. Checklinks was written with Apache in mind, and you can feed it your srm.conf file to auto-configure many settings.
Dead Link Check (DLC) is a Perl script designed to find information on validity of HTTP references. The script may use/generate a cache file for avoiding redoing network requests if the user wants to check added entries. The script works by reading entries from a file (or a list of links from the command line) and output results in file(s) (or STDOUT). DLC was created as an extension to Public Bookmark Generator (PBM), but can be used on its own.
GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
ht://Check is a link checker derived from ht://Dig. It can retrieve information through HTTP/1.1 and store it in a MySQL database so that after a "crawl", ht://Check can return broken links, anchors not found, content-types, and HTTP status codes summaries. ht://Check also performs accessibility checks in accordance with the principles of the University of Toronto's Open Accessibility Checks (OAC) project, allowing users to discover site-wide barriers like images without proper alternatives, missing titles, etc. A PHP interface lets the user query and view the results directly via the Web.
HTML-Tree is a Perl program that recursively decends directories, and creates a web-page based graphical map of HTML pages on a webserver. A configuration file provides control over the "root" directory for the map, map page title and header, directories to be excluded, link substitution strings, and map page background image. This mapper may be run as a cron task to provide an up-to-date roadmap of a webserver. It is primarily useful as a web site development and administration tool, since it shows all pages available to web browsers, and can identify where links are needed.