With LinkChecker, you can check HTML documents and Web sites for broken links. It features recursion, robots.txt exclusion protocol support, HTTP proxy support, i18n support, multithreading, regular expression filtering rules for links, and user/password checking for authorized pages. Output can be colored or normal text, HTML, SQL, CSV, or a sitemap graph in DOT, GML, or XML format. Supported link types are HTTP/1.1 and 1.0, HTTPS, FTP, mailto:, news:, nntp:, Telnet, and local files.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
RabbIt is a mutating, caching Web proxy used to speed up surfing over slow links like modems. It does this by removing advertising and background images and scaling down images to low quality JPEGs. RabbIT is written in Java and should be able to run on any platform. It does depend upon an image converter if image scaling is on. The recommended image converter is "convert" from the ImageMagick package.
Websitary is a script that monitors Web pages, RSS feeds, and podcasts and reports what's new. For many tasks, it reuses other programs (such as w3m, diff, and webdiff) to do the actual work. By default, it works on an ASCII basis, i.e. with the output of text-based Web browsers. With the help of some friends, it can also work with HTML.
safox is a simple PHP API for XML handling. It merges the DOM approach with XML, and it provides a simple, object-oriented API for PHP-based XML generation, parsing, manupilation, and traversal. SAFOX provides a generation package and a package that parses XML documents and returns objects.
Web Secretary is Webpage change notification (monitoring) software. It goes beyond the normal functions offered by such software by detecting changes based on content analysis, making sure that it's not just HTML that changed, but actual content. You can tell it what to ignore in the page (hit counters and such), and it can mail you the document with the changes highlighted or load the highlighted page in a browser.