Using tMemorizerows in Talend to aggregate items to a parent

by | Feb 24, 2015 | BlogPosts, Tech Tips | 0 comments

Sometimes, data extracted from systems will contain rows that are actually parent line items with child line items underneath.

For example, you might be working with a travel and expense system that lists a parent item like ‘Travel’ and then child items like ‘Car Mileage’, ‘Train Tickets’ and ‘Subsistence’ all in the same column. By using tMemorizeRows in Talend, you can identify and aggregate all of the child items to the parent level and get a sum of cost based on the dates the expenses were logged.

Business Requirements (Mocked up Travel and Expense data):

  • Identify and aggregate all the child items to its parent level on a daily basis.
  • The resultant data should be a sum of cost, aggregated daily to its parent level.
Expense Type (Child Items) Parent
Car Mileage Travel
Train Tickets
Subsistence

Sample Dataset:

Sample Dataset

 

Talend Job:

Talend Job - T&E

 

Component Details
tFixedFlow (Sample Travel &Expense Data) Contains sample data (see Sample data image) for each day in January.
tMap_1 Map output schema and add an internal column (ParentNum) to build an expression that generates a numeric sequence based on the parent value in the data. Map Output Schema T&EExpression used:("Travel".equals(source_data.Expense_Type))?(Numeric.sequence(source_data.Expense_Type,0,1)):(ParentNum_tMemorizeRows_1[0])The above expression assigns a sequential numeric value every time “Travel” appears in the data.
tMemorizeRows_1 Memorize the column used to determine the parent item.tmemrows1
tAggregateRows_1 This component is used to Group the data by the ParentNum and date.tAggregateRows1 Total Expenses raised is the count of all the child items for the parent item – Travel.Total cost is the sum of all the child items within each ParentNum.Expense ID contains the first ID that is found in the grouped items.
tSortRows (sort rows by date) This component sorts the aggregated data set by date in ascending order

Results

Before Aggregation (tLogRow_2)

The image below is a snap shot of the data before aggregation. The key thing to note here is the column ParentNum which uses the expression (see previous table) and the number it has assigned when “Travel” appears in the Expense_Type column. This is then used in the tAggregateRows component for grouping.

tLogRows before Aggreation

 

After Aggregation (Final Output)

Final Output

The final output snap shot above shows the data after aggregation. For the purpose of this demo, I haven’t deleted ParentNum column as all the data has been grouped using this number assigned in this column.

The result aligns with the requirements which is to aggregate the data on a daily basis summing the costs of child items to its respective parent item (Travel).

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *