11 projects tagged "Text Processing"
docx2txt is a tool that attempts to generate equivalent (ASCII) text files from Microsoft .docx documents, preserving some formatting and document information (which MS text conversion drops) along with appropriate character conversions for a good (ASCII) text experience. It is a platform independent solution consisting of (core) Perl and (wrapper) Unix/Windows shell scripts and a configuration file to control the output text appearance to fair extent. It can very conveniently be used to build a Web based docx document conversion service. Some Makefiles and Windows batch files are provided for easy installation of the scripts. With unzippers like CakeCmd that can deal with corrupt Zip archives, this tool can extract text from corrupt docx documents in many cases, where MS word processor fails to even open them.
Vee is a command-line blog tool that is very portable across Unix systems. It provides an interactive as well as a batch interface to maintain a log of entries. Formatting is done using a module architecture that allows a high degree of customization. There are minimal flags and no set up is required.
uWiki is a minimalistic wiki engine. All actions are implemented in external scripts. These scripts are wikified, and thus the wiki is extensible by itself. All dynamic access is protected through ACLs. Wiki content and Web content can be mixed in the same directory hierarchy. Markup engines and revision control are plugin-able. Currently, asciidoc as the markup engine and git as the revision control backend are provided. Subdirectories can form independent sub-wikis with own revision control. Features like distributed pages that syncronize between wikis, spam protection, and batch jobs to schedule mirroring of other content (bittorrent, git, rsync, and wget) are in planning.
smupcheck, which stands for Smart Update Checker, checks Web sites for updates automatically, even if they don't offer an RSS feed. It is a very basic tool, and does not offer advanced features such as checking password-protected Web sites, highlighting changes, or filtering results.
TYM (Typo Manager) is software for managing fonts in formats like .OTF (OpenType), .TTF (TrueType), and .PFA/.PFB (Typo1). It allows you to add or "link" fonts, activate or deactivate them, and delete them. It also handles the "group font" function and stores several fonts inside one file.
Texpipeline allows the conversion of (La)TeX documents with simple filter pipes such as "cat input.tex | tex2pdf > output.pdf". It removes the hassle of running (la)tex again and again, leaving lots of auxiliary files in the current directory. This is especially useful for programs which create LaTeX files automatically and just want to present the PDF output to the user.
WriteTarget is a universal text generator based on Bash text substitution. It can be used to generate text in any programming or markup language. The generator does not define its own language; it rather defines several functions, making it possible to use Bash for creating simple or sophisticated templates.
dvdmenuauthor makes it easy and efficient to author a DVD with menus in an indirect (non-WYSIWYG) way. An XML project file drives the DVD authoring, from which both menus and a dvdauthor XML file are generated. dvdauthor and spumux are then used to author the DVD filesystem. Menu items (buttons and static items such as images and text) can be specified conscisely in the project XML file with LaTeX markup (to be processed by pdfLaTeX and rendered by xpdf).