6 projects tagged "Text Processing"
Project35 is an application suite that allows users to generate data entry forms from XML schema. Application designers use a Configuration Tool to associate records and record fields defined in the schema with application properties that include features such as: validation services, controlled vocabulary services, general plugins, and various aspects of look-and-feel.
DeXSS provides a SAX2 Parser to help protect against Cross-site scripting (XSS) attacks. DeXSS uses TagSoup to parse potentially malformed input, followed by a SAX2 filter pipeline to remove JavaScript from HTML. You can use the DeXSS parser in place of your existing SAX2 parser, or you can use the DeXSS utility to provide a string-to-string conversion.
TagSoup is a SAX-compliant parser written in Java that, instead of parsing well-formed or valid XML, parses HTML as it is found in the wild: poor, nasty, and brutish, though quite often far from short. TagSoup is designed for people who have to process this stuff using some semblance of a rational application design. By providing a SAX interface, it allows standard XML tools to be applied to even the worst HTML. TagSoup also includes a command line processor that reads HTML files, and can generate either clean HTML or well-formed XML that is a close approximation to XHTML.
yodl implements a pre-document language and tools to process it. It lets you write a single document, then use a tool (e.g., yodl2html) to convert it to some final document language (HTML, man, LaTeX, etc.). Yodl's document language is easy to use and to expand. Predefined converters are available from Yodl to HTML, LaTeX, groff (manpages), text, and (experimentally) XML, but new converters can be added easily.