GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
With LinkChecker, you can check HTML documents and Web sites for broken links. It features recursion, robots.txt exclusion protocol support, HTTP proxy support, i18n support, multithreading, regular expression filtering rules for links, and user/password checking for authorized pages. Output can be colored or normal text, HTML, SQL, CSV, or a sitemap graph in DOT, GML, or XML format. Supported link types are HTTP/1.1 and 1.0, HTTPS, FTP, mailto:, news:, nntp:, Telnet, and local files.
HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.
mod_sphinx is an Apache module which intercepts output bound for the client, whether it be from files served by the same Web server, files sent via reverse proxying, or files delivered by transparent forward proxying (outbound proxying, as in your browser's proxy setting). It opens the text and replaces a word with another word or phrase. When in regular proxy mode, it cannot modify SSL content since Apache cannot decrypt that (it would be a protocol violation).
CCTE is a small utility to provide a way to run automated tests on Web applications. It acts like a Web browser GETing form data to different URLs, and logging output. It has two ways to understand if a CGI did OK: HTTP status, and user-definable strings for both (success and error).