Projects / cpdetector

cpdetector

cpdetector is a small yet clever framework for codepage detection that integrates different strategies. It may be used as a library for third party software that accesses textual data over network. It also includes a best-practice implementation in form of a command line tool that allows sorting and transforming large collections of documents based on their codepage. Available strategies include: jchardet (exclusion, frequency analysis, and guessing), detection of the HTML charset property, and detection of the XML encoding declaration.

Tags
Licenses
Implementation

Recent releases

  •  04 Dec 2011 15:18

    Release Notes: This release fixes a crash in command line mode when an invalid declared charset (the "" charset) was found. The return code of the command line tool (CodepageProcessor) does not return 0 in case of an error anymore. A bug that broke the ability to reset input streams after detection was fixed.

    •  16 Nov 2011 18:04

      Release Notes: This major bugfix release fixes two issues in commandline batch mode. The switch to skip moving undetected documents works now again. No attempt will be made to transcode undetected documents (the latter caused exceptional program flow).

      •  26 Jun 2010 23:27

        Release Notes: This version is a stability release and fixes the byte order mark detection and incompatibility with OpenJDK. It also requires Java 1.5 now.

        •  17 Jun 2008 21:19

          Release Notes: The release structure has been changed: cpdetetor.jar does not contain 3rd party library files anymore. Missing public functions are contained again. The proguard shrinker has been updated from version 3.8 to 4.2.

          •  15 Jun 2008 09:22

            Release Notes: The proguard shrinker is now used, so the cpdetector jar is now more than ten times smaller. System.out is no longer used for logging in JChardetFacade. All packages were renamed with the prefix "info.monitorenter".

            Screenshot

            Project Spotlight

            OpenStack4j

            A Fluent OpenStack client API for Java.

            Screenshot

            Project Spotlight

            TurnKey TWiki Appliance

            A TWiki appliance that is easy to use and lightweight.