Juta (Java Usenet Traffic Analyzer) is a Java (1.2 or higher) command line program that reads a text file with Usenet postings, and creates an HTML file with statistical information of those postings. This includes a list of the longest threads, authors with most postings, operating systems and newsclients used, and more. The inclusion of person-related information can be turned off to ensure privacy. The program is distributed as an executable JAR that also contains the source code. Also included in the distribution are several tools to process news postings: sort and count them, remove duplicates, list information on postings, grep and filter them.
Imageinfo is a single Java class that examines InputStream and RandomAccessFile objects. It checks whether the stream/file is in one of the supported image file formats and determines image width, height, and color depth. ImageInfo does not depend on the AWT or additional libraries.
Re: Indexing PDFs
> Since the authors of pdf files do not
> always include metadata into the file,
> design a tool that handles metadata both
> ways, putting metadata into a pdf from
> and file or pulling the metadata out of
> the pdf into a file. If an extensible
> standard for the seperate metadata files
> were made a whole set of tools might
> could be made that handles putting and
> pulling metadata from almost any file
> type that includes metadata pdf, mp3,
> ogg, mpeg, avi, and probably others.
Adobe's XMP (http://www.adobe.com/products/xmp/main.html) is a common metadata framework. I think it is already used with some PDFs as well.
I agree that a multi-level approach that tries different ways to access metadata is preferable.
> ImageInfo IS a very handy utility.
> Have you ever thought about refactoring
> it so it will be easier to use and
> maintenance? Using factory/builder
> pattern to return a info class; if
> format not supported then return null or
> throw an exception.
ImageInfo has grown quite a bit. It started out as a class to extract width and height from GIF, PNG and JPG. Now it can extract more from more formats, and in order for it to grow even further it would certainly have to be refactored. However, I don't intend to write a 'real' meta data extraction library right now, as useful as that would be. Someone would have to start a new project for that. ImageInfo might be a good basis for such a library.