71 projects tagged "Text Processing"
DocBook Doclet creates DocBook XML and class diagrams from Javadoc comments, converts HTML to DocBook, and transfoms DocBook XML into various output formats. It consists of a complete DocBook distribution containing schemas and the DocBook XSL stylesheets. It also integrates Apache FOP as the XSL:FO processor. A Swing application is used to customize the doclet and most of the DocBook XSL parameters and to start the transformations.
pstotext extracts text (in the ISO 8859-1 character set) from a PostScript or PDF (Portable Document Format) file. Thus, pstotext is similar to the ps2ascii program that comes with ghostscript. The output of pstotext is better than that of ps2ascii, because pstotext deals better with punctuation and ligatures.
Jerry's Music Review System generates an HTML page from a simple flat file containing the band name, album name, date purchased, and a review of the album. It arranges the information and cross-indexes on band name and album name. It also displays a list of the 10 most recently purchased albums. Examples can be found on the homepage.
MpFot allows you to create MetaPost files from images in JPEG or GIF format. It uses Java2 and can optionally process the images before rendering to MetaPost files. MpFot can currently modify brilliance, color balance, and saturation, and it can also perform posterization, inversion of colors, and manual cropping. All the changes can be compared with the original image through a courtain-like facility.
Java Search Engine is a server-side search engine program for Web sites written completely in Java. It features HTML and PDF indexing, a built-in Web crawler, international encodings support, words and phrases search, and returning results as quotations with highlighted words (like Google). It is available as EJB, JSP, servlet, or Java API library. For non-Java enviroments, it is available as an XML server with XSLT support.