GNU Wget is a utility for noninteractive download of files from the Web. It supports HTTP and FTP protocols, as well as retrieval through HTTP proxies. It can follow HTML links, download many pages, and convert the links for local viewing. It can also mirror FTP hierarchies or only those files that have changed. Wget has been designed for robustness over slow network connections; if a download fails due to a network problem, it will keep retrying until the whole file has been retrieved.
With LinkChecker, you can check HTML documents and Web sites for broken links. It features recursion, robots.txt exclusion protocol support, HTTP proxy support, i18n support, multithreading, regular expression filtering rules for links, and user/password checking for authorized pages. Output can be colored or normal text, HTML, SQL, CSV, or a sitemap graph in DOT, GML, or XML format. Supported link types are HTTP/1.1 and 1.0, HTTPS, FTP, mailto:, news:, nntp:, Telnet, and local files.
HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.
urlwatch is a script intended to help you watch URLs and get notified (via email) of any changes. The change notification will include the URL that has changed and a unified diff of what has changed. The script works out of a single directory, so there is no need to install anything. State files are kept in the same folder. The script supports stripping parts of a page that are always changing through the use of a filter hook function. It is typically run as a cronjob.
RabbIt is a mutating, caching Web proxy used to speed up surfing over slow links like modems. It does this by removing advertising and background images and scaling down images to low quality JPEGs. RabbIT is written in Java and should be able to run on any platform. It does depend upon an image converter if image scaling is on. The recommended image converter is "convert" from the ImageMagick package.
SEO SpyGlass Professional is a feature-rich backlink analysis software for webmasters and SEOs who need to outrank their competition in all major search engines. SEO SpyGlass unearths your competitors' link-building strategies and lets you capitalize on their best optimization techniques. On the basis of competition research, the software develops a sure-winning step-by-step optimization plan that will lead your website to the top of Google and other search engines. This software is also available within SEO PowerSuite, which includes WebSite Auditor, LinkAssistant, and Rank Tracker as well.
SEO SpyGlass Enterprise is a feature-rich backlink analysis software for professional SEOs and webmasters. The tools provide the deepest possible insight into your competitors' link building strategies. It lets you milk your competitors of their most SEO-productive backlink sources and use these sources in your own SEO campaign. You'll check all backlinks for Google PageRank and Alexa Rank, scrutinize the used anchor texts and URLs, discover how many backlinks come from forums and blogs, from homepages, from DMOZ-listed sites and much more. The software generates 5 types of reports that can be further printed out, sent to your clients by e-mail or made available on a Web site. You can brand your reports with a company logo and set their color schemes and data layout. This software is also available within SEO PowerSuite, which includes WebSite Auditor, LinkAssistant, and Rank Tracker as well.
webcheck is a Web site checking tool for Web masters. It crawls a given Web site and generates a number of reports. The whole system is pluggable, allowing extra reports and checks to be added easily. It supports retrieving Web sites over HTTP, file, and FTP protocols and produces reports on site structure, broken links, old Web pages, overviews of external links, and more. The links that webcheck considers external are configurable through regular expressions, and webcheck honors robots.txt.