A clever tool that works as advertised
Definitely give this app a try. You'll need to do some initial
configuration to teach it, for example, which elements in your XML
files are inline elements and which are block elements, which
elements need to be handled as "verbatim" elements, and which you
want it to whitespace-normalize. But once you've done that initial
configuration, I think you'll find it works as expected --
including handling mixed content correctly.
In testing it with a number of files of 15,000+ lines, I never
found a single instance of it adding whitespace where it shouldn't
have been added or deleting whitespace where it should have been
preserved, or wrapping or indenting anything I didn't ask it
For a few more details, see the
xmlhack item (xmlhack.com/read.php?i...) I
wrote about it.
Re: don't pretty print
Hmm. I'm uncertain how to respond to this. It
seems to be a comment such as
one might write without having examined the software
in question or read any
of its documentation.
Please cite your reference for saying that any and all
whitespace in XML
The XML spec (www.w3.org/TR/REC-xml)
doesn't appear to forbid the
idea of optional whitespace. For example, see
"2.10 White Space Handling
In editing XML documents, it is often convenient to
use "white space"
(spaces, tabs, and blank lines) to set apart the markup
readability. Such white space is typically not intended
for inclusion in the
delivered version of the document. On the other hand,
space that should be preserved in the delivered version
is common, for
example in poetry and source code."
In addition, the spec for Canonical XML (http://
deals with various types of "reformatting" and appears
to reject the notion
that whitespace *must* be preserved. In particular, see
Still, all that is irrelevant for my purposes. If a
document is mine, it's
my call which whitespace should be preserved and
which may be transformed.
If I want to reformat my documents, I will. I fail to see
the value of
telling people that should not do with their documents as
they see fit.
Sure, you can reformat a document in such a way that
it becomes unsuitable
for some purposes. I assume that users are intelligent
enough to know what
is permissible for their purposes and what is not.
It's true that some editors decide to reformat things.
That's a case of some
program performing reformatting without consulting
you. xmlformat doesn't
reformat anything unless you ask it to. It doesn't sneak
up on you and work
its will on you unbeckoned. The two situations are quite
It's entirely irrelevant what an editor might do,
except in the sense that
xmlformat can in fact be used to compensate for the
formatting imposed on
you by an editor: As it happens, one of the motivations
xmlformat was to have a way to put XML files in a
standard format before
checking them into a revision control system. If
different people work on
the files using different editors with different format
artificially balloons the size of diffs and makes them
more difficult to
read. xmlformat helps reduce this problem. I apologize
if this was not
clear, though in my defense, it's necessary to read only
into the second
paragraph of the documentation to find it out.
don't pretty print
Doing things like this absolutely breaks the XML spec. Any and all whitespace is significant. Adding whitespace for "pretty printing", assuming XML processors all work like HTML, is very evil.
I've had quite a few apps breaks because editors decide to be helpful and reformat things, adding spaces which the app interpreted literally (as it *should*, according to the XML spec). This kind of thing shouldn't be encouraged. ;-)
A probabilistic Java toolkit for building search engines.
A basic image difference viewer.