Welcome to the Datalytyx Tube Series!
This is part 1 of the Datalytyx Tube Series. If you have not already read the introduction, you can find it via this link: Datalytyx Tube Series #1: Exploring the London Underground.
In this blog article, we will take a look at the tRestClient component in Talend and I’ll show you how I have used it to retrieve the statuses of the London underground lines using a web service as provided by Transport for London (TFL).
Before we get going with Talend, let’s take a look at the Transport for London (TFL) data. The first step to accessing TFL data is to visit: Transport for London: Open Data Users. Here you can register to use the data and explore the various data sets and feeds that TFL make available.
If you go to the Documentation section and scroll to Our data downloads you’ll see the Tube section. Upon clicking the Tube departure boards, line status and station status link you will be taken to the following url: Tube departure boards, line status and station status.
You’ve just sent a request to the TFL RESTful web service and received a response, well done you! The XML table you can see is a live update of the current status of the lines on the TFL network. Even if you’re not familiar with XML at this point in time, you should be able to glance at the general structure and recognise patterns, such as <Line ID=”2″ Name=”Central”/> and <Status ID=”GS” CssClass=”SevereDelays” Description=”Severe Delays” IsActive=”true”> just as a random example.
The Talend-y bit
Now you’ve seen how you can send a request and receive a response in a web browser, let’s take a look at how we can do this in Talend. Below is the Talend job I have created to retrieve the line statuses.
I will now go through the part of the job that retrieves the statuses.
The tRestClient component will send a request to a RESTful web service and outputs the corresponding response, in this case there is not much configuration of the component needed, as you can see in the screenshot below. As the component defaults for “HTTP” Method and “Accept Type” are GET and XML respectively, the only changes we need to make are to “URL” and “Relative Path.” You ought to be able to see this in the screenshot:
The schema of the response from the component is shown below. We are interested in the body that contains the desired XML.
If you connect the tRestClient component to a tLogRow to test what the output looks like, you ought to get the result below:
Again, even with no knowledge of XML, you should be able to identify a structure and identify some tube lines in the above image. As you can see, Talend interprets this as one row of information. However we know that, hidden in the body column, there are multiple rows of information relating to each line of the underground tube network. So how do we get Talend to recognise this?
That leads to my next blog: how to use a tXMLMap!