libextractor is a library used to extract meta-data from files of arbitrary type. It is designed to use helper-libraries to perform the actual extraction, and to be trivially extendable by linking against external extractors for additional file types. The goal is to provide developers of file-sharing networks, file managers, and WWW-indexing bots with a universal library to obtain meta-data about files. It includes a shell-command and bindings for Java (JNI) and Python.
| Tags | Software Development Libraries Internet Web Indexing/Search Communications File Sharing Text Processing Indexing |
|---|---|
| Licenses | GPL |
| Operating Systems | POSIX BSD Unix Linux Windows Windows Mac OS X |
| Implementation | C |
Recent releases


Release Notes: This release adds support for Matroska, fixes some minor bugs (leaks on error-handling paths), and does some minor code clean up (fixing compiler warnings about dead code).


Release Notes: This release fixes various minor bugs, in particular better handling of malloc failures and more robust handling of malformed inputs in various plugins.


Release Notes: This release fixes a problem with LE not finding its plugins under certain conditions. It also fixes an IPC issue under FreeBSD which caused some plugins to not work.


Release Notes: This release adds out-of-process execution for plugins and improves the quality and quantity of the extracted meta data for many formats. It breaks API compatibility.


Release Notes: This release adds support for librpm 4.7 and uses an external version of libexiv2 for improved and more up-to-date EXIV2 support.
Recent comments
02 Feb 2008 05:00
Re: online demo not working
There are two PDF plugins, one that is quite
simplistic and another one based on code from
xpdf (which has a bad security track record).
Depending on which one I happen to enable on the
website (options to configure), you get more or
less information for PDF files.
> When I upload dmca.pdf all it gives me
> is mimetype. Am I missing something?
24 Jan 2008 15:40
online demo not working
When I upload dmca.pdf all it gives me is mimetype. Am I missing something?
14 Aug 2005 21:25
Re: Also Requires gobject-2.0
Note that as of 0.5.3 LE still needs gobject-2.0 but the
ordinary shared version will do fine now.
27 Jan 2005 10:15
Re: Also Requires gobject-2.0
Well, gobject-2.0 is part of glib, so it is listed as a
dependency. What is more tricky is that we need the
static, relocatable version of the library -- but try to specify
that on freshmeat :-).
27 Jan 2005 10:07
Also Requires gobject-2.0
Can't seem to get the OLE2 libraries to compile, make complains:
/usr/lib/gcc-lib/i686-pc-linux-gnu/3.3.3/../../../../i686-pc-linux-gnu/bin/ld: cannot find -lgobject-2.0
Oh, and you may want to include these dependencies within either the README or INSTALL files.
A program to analyze your databases and check your data quality.