GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
With LinkChecker, you can check HTML documents and Web sites for broken links. It features recursion, robots.txt exclusion protocol support, HTTP proxy support, i18n support, multithreading, regular expression filtering rules for links, and user/password checking for authorized pages. Output can be colored or normal text, HTML, SQL, CSV, or a sitemap graph in DOT, GML, or XML format. Supported link types are HTTP/1.1 and 1.0, HTTPS, FTP, mailto:, news:, nntp:, Telnet, and local files.
RabbIt is a mutating, caching Web proxy used to speed up surfing over slow links like modems. It does this by removing advertising and background images and scaling down images to low quality JPEGs. RabbIT is written in Java and should be able to run on any platform. It does depend upon an image converter if image scaling is on. The recommended image converter is "convert" from the ImageMagick package.
PHPX is a Web portal system, blog, Content Management System (CMS), forum, and more. It is designed to allow everyone to be able to have feature-rich, interactive websites even if you do not know a bit of programming. Some key features include fully-integrated forums, downloads, an image gallery with slideshow and auto-thumbnailing, support ticket system, a GUI interface for Web page content management, news with topics and instances, and a whole lot more. It allows you to fully customize the look of your site.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
Web Secretary is Webpage change notification (monitoring) software. It goes beyond the normal functions offered by such software by detecting changes based on content analysis, making sure that it's not just HTML that changed, but actual content. You can tell it what to ignore in the page (hit counters and such), and it can mail you the document with the changes highlighted or load the highlighted page in a browser.
Hammerhead is a stress testing tool for Web sites. It initiates connections from multiple IP aliases and simulates a user from each alias. It is fully configurable, and there are numerous other options for creating problems with a site. Extensive data collection is also available.
webcheck is a Web site checking tool for Web masters. It crawls a given Web site and generates a number of reports. The whole system is pluggable, allowing extra reports and checks to be added easily. It supports retrieving Web sites over HTTP, file, and FTP protocols and produces reports on site structure, broken links, old Web pages, overviews of external links, and more. The links that webcheck considers external are configurable through regular expressions, and webcheck honors robots.txt.