5 projects tagged "Indexing"
libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.
Marko is a simple toolset that allows you to create markov chain databases of a corpus (or two) of text and then allows you to compare unknown texts to these databases. For any two marko databases you can calculate the probability that the unknown body is related to one over the other. Possible applications include intelligent mail filtering, plagiarism detection, and historical research.
Comment is a command line directory context note taker. Notes are stored in both the local directory and each users home. It was developed as a low impact tool for retaining flyaway information that is often needed at a later date. The dual storage system provides convenient access to prior notes, and all notes are stored in plain-text format.
nmzmail is a tool to use the namzu2 search engine from within the Mutt email reader to search email stored in maildir folders. Based on the result of the namazu query, nmzmail generates a maildir folder containing symbolic links to the messages matching the query. A simple Mutt macro makes it easy to use from within Mutt.