Version 0.4 of Docx to Text Converter (docx2txt)

Release Notes: Display of hyperlinks is configurable. TOC related cleanup was done. Many new character conversions were implemented. Character conversion tables were added. Currency characters are converted to full currency names. Code tweaks were done to speed up the conversion process.

    Other releases

    •  07 Apr 2014 23:01

    Release Notes: Adds support for handling lists (bullet, decimal, letter, and roman) along with (an attempt at) indentation. Adds the configuration variable config_twipsPerChar, and removes the configuration variables config_listIndent and config_exp_extra_deEscape. Text output omits deleted text. This matters in case changes are being tracked in a docx document.

    •  15 Jan 2012 02:10

    Release Notes: The Perl script can now take input from stdin, and also works with input/output redirection. Script files and the configuration file can now be installed in separate directories on (non-Windows) systems using Makefile for installation. The configuration file is now uniformly looked for in the current directory, the user configuration directory, and the system configuration directory, in the specified order. Handling of special (non-text) characters has been improved, along with support for more non-text characters, like fractions.

    Release Notes: Minor non-extraction feature enhancements and bugfixes, based on the feedback/input received from users. A check for the existence of the unzip command. The configuration file is looked for in $HOME as well. Configuration variables now begin with config_ . Bugs #3003903, #3082018, and #3082035 have been fixed. The null device for Cygwin has been fixed. Superscripted cross-references are placed within [...] now.

    •  05 Oct 2009 09:21

    Release Notes: This releases focuses mainly on user interaction aspects. The new features are a Windows installation script, a Windows wrapper script, support for using CakeCmd apart from Unzip, a configuration file, and support for working with a directory holding the unzipped content of .docx file. There has been improvement in handling of short line justification; many cases that were missed out in the earlier approach are captured. Path names containing spaces are now handled.

    •  06 Sep 2009 07:43

      Release Notes: Display of hyperlinks is configurable. TOC related cleanup was done. Many new character conversions were implemented. Character conversion tables were added. Currency characters are converted to full currency names. Code tweaks were done to speed up the conversion process.

      Screenshot

      Project Spotlight

      GNU Wget

      A network utility for downloading content from the Web.

      Screenshot

      Project Spotlight

      Jackcess

      A pure Java library for reading and writing MS Access databases.