Scenario
The starting point is the table of names below.name |
andrew |
andrew |
andrew |
berty |
berty |
emily |
emily |
emily |
edward |
edward |
edward |
edward |
hugo |
We want to add a new column, which is the equivalent to the tsql
row_number() over (partition by name order by 1) as rownumber
name | rownumber |
andrew | 1 |
andrew | 2 |
andrew | 3 |
berty | 1 |
berty | 2 |
emily | 1 |
emily | 2 |
emily | 3 |
edward | 1 |
edward | 2 |
edward | 3 |
edward | 4 |
hugo | 1 |
Talend TOS
In Talend Open Studio, create a job with the following components:tFixedFlowInput > tMap > tLogRow
“>” is a “Row > Main” flow.

tFixedFlowInput
I configure this component to- schema is just 1 column “name” of "String” datatype
- use the inline content below
andrew
andrew
andrew
berty
berty
emily
emily
emily
edward
edward
edward
edward
hugo
tMap
In this component I declare an output “out1” with columns- name, with String type
- rownumber, with Integer type

Note that I use an expression to populate the “rownumber” column.
Numeric.sequence(row1.name,1,1)
This expression can be explained as:
- Numeric.sequence : this defines a integer sequence
- row1.name : this is the column that is “partition by” and “order by” when compared to the tsql windows function
- 1 : the first “1” is the starting point for the sequence
- 1 : the second “1” is the sequence increment
tLogRow
Finally, the results are returned back to the execution console for review, using the tLogRow component.
The Results
Below is a screenshot of the final execution results.
Thanks, you just made my day ! I was struggling with the same.
ReplyDelete