One of the great functionalities of Talend is the flexibility of developments. There is the additional fact that Talend is quick off the mark to adopt the latest Big Data technologies, specifically open source Apache projects (such as MapReduce, Spark). Talend away all the pain of hand-coding in Python, Scala, or even in Java. It allows you to use a visual GUI based on Eclipse; use drag and drop components and simple configuration to the Hadoop cluster to build your jobs.
So say you’ve built a job on MapReduce or Spark today… What if a new execution engine technology emerges in the future? Surely you won’t want to rebuild your jobs from scratch! With Talend, you can choose the execution engine from which your job runs with a click of the button, thus avoiding all future development. It’s this kind of flexibility that makes Talend one of the most powerful ETL tools out there.
To summarise, I will be comparing a small developed job on MapReduce and demonstrate how to convert it Spark.
Firstly, create a MapReduce job under the Big Data Batch.
In this job, we will create a simple flow from one component tFixedFlowInput to a tMap, with a small transformation which is the concatenation of two rows. Then we display the data with the tLogRow.
Now click on the job again, right click, and choose Edit Big Data batch.
Choose the framework: Spark.
And that’s all done! Now the job should run on Spark. See how simple that was?
Note: In this job, Talend was able to complete the migration of the code from MapReduce to Spark without any additional developments. In reality, a minority of jobs might be more complex and some components might not fully convert to Spark. In these circumstances, some reviewing and testing is required.
I hope you enjoyed this article. I suggest subscribing to the Datalytyx blog in order to receive a monthly update on all our most recent blogs.