Projects / DataCleaner

DataCleaner

DataCleaner is a data quality analysis tool that allows you to perform data profiling, validating, and minor ETL-like tasks. These activities help you administer and monitor your data quality in order to ensure that your data is useful and applicable to your business situation. It can be used for master data management (MDM) methodologies, data warehousing projects, statistical research, preparation for extract-transform-load activities, and more.

Tags
Licenses
Implementation

RSS Last announcement

Community contributor contest 08 Nov 2012

Who will post the best content for use in DataCleaner?

Human Inference is announcing a competition for the DataCleaner community. The goal is to...

RSS Recent releases

Release Notes: You can now compose jobs so that a DataCleaner job actually calls/invokes another "child" job as a single transformation. Source column handling was improved, and the user can now choose which columns to include in a source query. Repository file locking was implemented to prevent concurrent reads and writes.

  •  24 Sep 2013 23:52

Release Notes: The 'Synonym lookup' transformation now has an option to look up every token of the input. This is useful if you're doing replacement of synonyms within the values of a long text field. A potential failure was fixed when blocking execution of DataCleaner jobs through the monitor's Web service. An improvement was made in the way jobs and the sequence of components are closed / cleaned up after execution. The Java WebStart version of DataCleaner was exposed by a bug in the Java runtime causing certain JAR files not to be recognized by the WebStart launcher under certain circumstances.

Release Notes: It is now possible to hide output columns of transformations. Hiding will not affect the processing flow, but simply hide them from the user interface, potentially making the experience cleaner when interacting with other components. A new Web service has been added to the monitoring Web application which provides a way to poll the status of the execution of a particular job. A bug has been fixed which caused the HTML report to fail for certain analysis types when no records had been processed. Six other minor bugs have been addressed.

Release Notes: This release adds a new filter for performing Change Data Capture, makes execution of jobs queued to avoid concurrent execution issues, and adds several minor bugfixes and improvements.

Release Notes: A major milestone for the data quality monitoring Web application: the addition of connectivity to Salesforce and SugarCRM. Addition of wizards and other user experience improvements. Enables clustered execution of jobs. New data visualization extension and a national identifier validation extension. Adds Pentaho Data Integration job scheduling and execution.

Screenshot

Project Spotlight

Mroonga

A fast full-text search engine for MySQL.

Screenshot

Project Spotlight

(R)?ex

A tool to ease the execution of commands on multiple remote servers.