Release Notes: This release adds the configuration variable config_unzip_opts. This removes the dependency on the unzip program and allows users to use unzipping programs like 7z, pkzipc, and winzip as well. This release also fixes list numbering, improves list/paragraph indentation and corresponding code, and updates the README with brief guidance on how this utility can be used to recover text from corrupted docx file.
Release Notes: Adds support for handling lists (bullet, decimal, letter, and roman) along with (an attempt at) indentation. Adds the configuration variable config_twipsPerChar, and removes the configuration variables config_listIndent and config_exp_extra_deEscape. Text output omits deleted text. This matters in case changes are being tracked in a docx document.
Release Notes: The Perl script can now take input from stdin, and also works with input/output redirection. Script files and the configuration file can now be installed in separate directories on (non-Windows) systems using Makefile for installation. The configuration file is now uniformly looked for in the current directory, the user configuration directory, and the system configuration directory, in the specified order. Handling of special (non-text) characters has been improved, along with support for more non-text characters, like fractions.
Release Notes: Minor non-extraction feature enhancements and bugfixes, based on the feedback/input received from users. A check for the existence of the unzip command. The configuration file is looked for in $HOME as well. Configuration variables now begin with config_ . Bugs #3003903, #3082018, and #3082035 have been fixed. The null device for Cygwin has been fixed. Superscripted cross-references are placed within [...] now.
Release Notes: This releases focuses mainly on user interaction aspects. The new features are a Windows installation script, a Windows wrapper script, support for using CakeCmd apart from Unzip, a configuration file, and support for working with a directory holding the unzipped content of .docx file. There has been improvement in handling of short line justification; many cases that were missed out in the earlier approach are captured. Path names containing spaces are now handled.
Release Notes: Display of hyperlinks is configurable. TOC related cleanup was done. Many new character conversions were implemented. Character conversion tables were added. Currency characters are converted to full currency names. Code tweaks were done to speed up the conversion process.
Release Notes: Center and right justification of text fitting in a line of (adjustable) 80 columns. Indication of hyperlinked text along with the hyperlink. A BSD makefile. Some suggestions on how Windows users can use this tool and more documentation. docx2txt.pl invocation has been changed a little. User involvement during installation is reduced.
Release Notes: Docx text extraction can now be done in two ways (check the version README for further details): "docx2txt.sh file.docx", or "docx2txt.pl infile.docx outfile.txt".
Release Notes: This initial release attempts to handle the following features during text extraction: horizontal ruler, line breaks, paragraph separation, and tabs; nested list formatting (naive); capitalization of text blocks; and character conversion (" ' < & > - etc.).