HTTrack is an easy-to-use offline browser utility. It allows you to download a Web site from the Internet to a local directory, building recursively all directories, getting HTML, images, and other files from the server to your computer. HTTrack arranges the original site's relative link-structure. Simply open a page of the mirrored Web site in your browser, and you can browse the site from link to link, as if you were viewing it online. HTTrack can also update an existing mirrored site, and resume interrupted downloads. WebHTTrack is a Web-based GUI for HTTrack.
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.
Alexandria is a GNOME application to help manage a book collection. It retrieves book information (including cover pictures) from several online libraries, allows you to search for a book (either by EAN/ISBN, title, authors, or keyword), can import and export data into ONIX, Tellico, and EAN/ISBN-list formats, generates Web pages from your libraries, allows marking your books as loaned, saves data using the YAML format, features an HIG-compliant user interface, shows books in different views that can be filtered or sorted, and handles book rating and notes.
Pinot is a D-Bus service that crawls, indexes your documents, and monitors them for changes. It is also a GTK-based user interface that enables you to query the index built by the service or your favorite Web engine, and display and analyze the results. It makes full use of advanced indexing and search facilities offered by Xapian, features language detection, dynamic document summaries, easy labelling of documents, and internal support for common file types. The D-Bus interface allows easy integration with other applications.
Invenio (formerly CDSware) is a suite of applications that provides the framework and tools for building and managing an autonomous digital library server. It complies with the Open Archives Initiative metadata harvesting protocol (OAI-PMH) and uses MARC 21 as its underlying bibliographic standard. Its flexibility and performance make it a comprehensive solution for the management of document repositories of moderate to large size.
Urd is a Web-based Usenet binary download manager. It stores the newsgroup information in a MySQL database and aggregates the articles into sets of a single download (e.g. one album or movie). The Web interface can be used to search with regular expressions. It uses its own downloading daemon that has support for scheduling downloads and updating databases. URD can also download directly from NZB files and even create NZB files. Further features include custom scripts, multiple languages, a template based Web interface, support for multiple servers, automatic par2 and unrar support, and an intuitive user interface.
LogicalDOC is a Web-based document management system that is easy to use and learn. Its architecture leverages best-of-breed Java technology to achieve a powerful and flexible solution. It supports its users with a powerful search engine (Lucene), Web service interface (JAX-WS via CXF) compatible with .NET and PHP, versioning, annotation on documents, a WebDAV interface, importing and exporting from .zip files. Documents can be organized into hierarchical folders, searched using the integrated search engine, or browsed by Tag. The system is extensible thanks to the technologies used (Spring-Hibernate) and its plugin architecture.