Search

Wednesday, 16 October 2019

Talend Row_Number functionality using tMap and Sequence

This post quickly details the steps involved in mimicking the tsql’s row_number() over (partition by…) windows function, all within Talend’s TOS (Talend Open Studio).

Scenario

The starting point is the table of names below.
name
andrew
andrew
andrew
berty
berty
emily
emily
emily
edward
edward
edward
edward
hugo

We want to add a new column, which is the equivalent to the tsql
row_number() over (partition by name order by 1) as rownumber

namerownumber
andrew1
andrew2
andrew3
berty1
berty2
emily1
emily2
emily3
edward1
edward2
edward3
edward4
hugo1


Talend TOS

In Talend Open Studio, create a job with the following components:
tFixedFlowInput > tMap > tLogRow
“>” is a “Row > Main” flow.
(I’ve renamed my tFixedFlowInput component to “table_names”)


tFixedFlowInput

I configure this component to
  • schema is just 1 column “name” of "String” datatype
  • use the inline content below
    andrew
    andrew
    andrew
    berty
    berty
    emily
    emily
    emily
    edward
    edward
    edward
    edward
    hugo


tMap

In this component I declare an output “out1” with columns
  • name, with String type
  • rownumber, with Integer type

Note that I use an expression to populate the “rownumber” column.
Numeric.sequence(row1.name,1,1)

This expression can be explained as:
  • Numeric.sequence : this defines a integer sequence
  • row1.name : this is the column that is “partition by” and “order by” when compared to the tsql windows function
  • 1 : the first “1” is the starting point for the sequence
  • 1 : the second “1” is the sequence increment


tLogRow

Finally, the results are returned back to the execution console for review, using the tLogRow component.


The Results

Below is a screenshot of the final execution results.

1 comment:

  1. Thanks, you just made my day ! I was struggling with the same.

    ReplyDelete