Validating address data in Talend Studio

by | Jan 12, 2016 | BlogPosts, Talend, Tech Tips | 0 comments

This article will present you with a crash course in validating address data using the tGoogleAddressRow component in Talend Studio. Incorrect or incomplete address information is a common problem in customer databases but thankfully, Talend offers an out-of-the-box data quality component. This component allows data integration developers to validate and/or enrich address data via the use of the tGoogleAddressRow component. I’ll explain how to setup a Google Places API key and how to create a simple Talend job in order to make use of the tGoogleAddressRow component.

First things first, you’ll need to obtain a Google Places API key to use with the tGoogleAddressRow. The key is free but limited to 100 uses per day. This can be increased to 10,000 for 60 days for free if you register for billing.

1. Navigate to https://developers.google.com/places/web-service/, sign in with your Google account and click ‘Get a Key’.

2016-01 Blog - Validating address data - picture 1

2. Create a project.

2016-01 Blog - Validating address data - picture 2

3. Name the key and click ‘Create’.

2016-01 Blog - Validating address data - picture 3

4. Now you have your Google Places API key and you are ready to create the Talend job.

2016-01 Blog - Validating address data - picture 4

5. Begin with a tFixedFlowInput; create a schema that represents your business’ customer address structure then enter some parts of an address you want to cleanse.

2016-01 Blog - Validating address data - picture 5 2016-01 Blog - Validating address data - picture 6

6. Now connect the output to a tMap…

2016-01 Blog - Validating address data - picture 7

…and create a ‘string’ output in your tMap and concatenate the address fields together. For example:

Address.CompanyName + ” ” + Address.AddressLine1 + ” ” + Address.AddressLine2 + ” ” + Address.City + ” ” + Address.PostalCode

There will be a degree of trial and error involved to find the best fields to use. Google will detect companies’ names which will aid in the verification of an address.

2016-01 Blog - Validating address data - picture 8

7. Connect the output to a tReplace and configure to replace “null” with empty string “”. This is useful when not all of the address fields from the source are populated.

2016-01 Blog - Validating address data - picture 9 2016-01 Blog - Validating address data - picture 10

8. Now connect the output to a tGoogleAddressRow.

2016-01 Blog - Validating address data - picture 11

9. Configure with the API key and the row that contains the concatenated address data.

2016-01 Blog - Validating address data - picture 12

10. Connect the output to a tLogRow. Finally, the completed Talend jobs should look similar to this:

2016-01 Blog - Validating address data - picture 13

Run the job and lo and behold, witness the wonder of the tGoogleAddressValidation component.

2016-01 Blog - Validating address data - picture 14

The Input given was “datalytyx, holborn, London” and the output was Holborn Gate, 330 High Holborn, London WC1V 7QT, United Kingdom.

As you can see it has done a good job of filling in the missing information. As well as the ‘formatted address’, the API returns the individual address lines that can then be mapped back to address structure specified in the input.

Talend also offers a tLoqateAddressRow component that uses the Loqate address cleansing and validation service. This component and service offers more functionality. You can specify the individual address lines on the input, as opposed to supplying one string. Also it returns a comprehensive ‘accuracy code’ which indicates how likely the address API accurately cleansed the various parts of the address. This service is not free, however. But you can try the web-based demo for free at http://www2.loqate.com/demo/register/new/.

I hope this has helped!

 

 

Subscribe now for regular updates on new blog releases:

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *