An overview of Talend Data Preparation

by | Feb 9, 2016 | BlogPosts, Talend, Tech Tips | 0 comments

Some time ago we wrote about the new Talend Open Studio, version 6.0, back when it was still in Beta. Well now we would like to share a new product created by Talend. It’s called Talend Data Preparation and it might be an amazing offering for a particular audience. Let’s begin!

Talend Data Preparation, as the name suggests, is a tool for data cleansing and visual analysis. But unlike its siblings from other vendors, it has a clean, simple, user friendly interface with which a non-technical user can import and prepare data for further processing with no training or explanation necessary.

When you start Talend Date Preparation you are greeted with the screen seen below:

2016-03 Blog - Talend Data Preparation - Image 1

It’s a minimalistic overview of all the files you currently have imported and includes options to add new files and search the existing files.

The product is currently in BETA and some of the features are still in development, however what follows is what can be done with the tool right now.

The data import option has three options. You can import data from:

  • Local files: CSF, EXCEL and other delimited files
  • Files from HTTP: Files from a website
  • HDFS Files: Files from a HDFS store

Once you select an option and import a file, it will be immediately opened up in a new editing window.

2016-03 Blog - Talend Data Preparation - Image 2

Let’s focus on the first column: ID. As you can see right underneath the column name, there is a bar that is mostly green. However the end is in orange and that signifies the values that are invalid based on the column type definition. (The column is integer, yet in row 12 we can see that the data format is violated with a character).

On the right hand side, you can see that since the ID column is selected, we can perform different actions, specific to the column data type. We can remove invalid values visible within the column, quickly view different charts to see data patterns, check the maximum/minimum values, sum up the values, and so on so forth. All of this can be done simply by making a couple of clicks; no Excel formulas, Java, or SQL code is required. This means you do not have to be an Excel or a Coding guru to check the data for issues, patterns, etc.

The interface buttons are also quite large, so if you have a Windows tablet or a laptop such as Lenovo Yoga you can use the touch screen with no issues and thus work with data on the go.

It’s early days for the tool and Talend is also working on adding extensibility through the Talend Exchange market place. This means that developers will be able to create plugins for the tool, thereby extending the functionality for pattern matching and other features.

If you are a Talend Enterprise client I would highly recommend that you head over to the Talend website and download the tool. I am sure you will not be disappointed.


Submit a Comment

Your email address will not be published. Required fields are marked *