MKSearch is a metadata search engine that indexes structured metadata in Web documents instead of free text in the document body. The data acquisition system conforms to the Dublin Core metadata in HTML recommendations, and supports other application profiles, such as the UK e-Government Metadata Standard. It also indexes native RDF formats, including RSS 1.0. The system has five major components: a Web crawler, an HTML document validator and formatter, a set of custom indexers, an RDF storage and query system, and a public query interface, provided through a standard servlet container.
Nabu is a simple framework that extracts chunks of various types of information from documents written in simple text files (written with reStructuredText conventions, parsed with docutils) and that stores this information (including the document) in a remote database for later retrieval. The processing and extraction of the document is handled on a server, and there is a small and simple client that is used to push the files to the server for processing and storage. The client requires only Python to work. The presentation layer is left unspecified: you can use whichever Web application framework you like.
Generating new objects for the Query Object Framework is repetitive, tedious, and time consuming. Qof Generator automates this process in PHP to build a working test program linked against QOF. Objects are created from an HTML form using a temporary MySQL cache and exported with Makefile, ./autogen.sh, ChangeLog, README, C source code, and doxygen mark-up comments in a tarball built by the PHP code.
Tripoli is a Python implementation of a "triple space": that is, a triple store with tuple space semantics. It supports the synchronization of concurrent processes via a shared data structure. Processes can add triples to the store, and read or take triples from the store using pattern matching. If a triple matching a pattern is not yet in the store, a query will block until a suitable triple is added by some other process. Many synchronization patterns can be expressed using these primitives. Tripoli extends the semantics of tuple spaces with two additional operations, copy_graph and copy_collect_graph. These copy or move the graph of all triples that are connected to a given subject to a new triple space, and can be used together with the other pattern matching operations to express procedural queries over triple data.
Annotatio is an implementation of a client and a server based on the W3C Annotea protocol. It allows you to create and share annotations to various types of documents. Most people can use this program to save comments to HTML documents, but it also supports other types of annotations and XML-like documents. In opposition to the current implementation of the W3C, Annotatio will save enhanced information of the annotation's positioning within the document, which will allow positing the annotation even if the location or structure of the original document has changed. In local mode, it will save all annotations locally, and in remote mode, it will save the annotations on a central Annotea-compatible server, such as Annotatio Server.
libiptcdata is a C library for manipulating the International Press Telecommunications Council (IPTC) metadata stored within multimedia files such as images. This metadata can include captions and keywords, often used by popular photo management applications. The library provides routines for parsing, viewing, modifying, and saving this metadata. The libiptcdata package also includes Python bindings and a command-line utility, iptc, for viewing and editing IPTC data in JPEG files.
Osprey is a peer-to-peer enabled content distribution system. It is a metadata management system for software and document collections which enables local and distributed searching of materials. Items are available for download directly via an URL or indirectly via the BitTorrent peer-to-peer protocol. Two components exist: the Osprey Web application and permaseed (permanent seed). The Web application includes metadata management for finding and exploring available content, as well as a BitTorrent tracker.
Red-Piranha is a search system that can actually learn what you are looking for. It can be used as a Web page, command line, or XML-WebService, so it will work with most languages, including Java, Perl, C#/.NET, and PHP. It includes learning abilities for the Desktop/Internet search functionality. All feedback from the user is stored in (editable) XML and RDF, and is used by the system to improve the quality of searches.