Search

Wednesday, 16 October 2019

Talend Row_Number functionality using tMap and Sequence

This post quickly details the steps involved in mimicking the tsql’s row_number() over (partition by…) windows function, all within Talend’s TOS (Talend Open Studio).

Scenario

The starting point is the table of names below.
name
andrew
andrew
andrew
berty
berty
emily
emily
emily
edward
edward
edward
edward
hugo

We want to add a new column, which is the equivalent to the tsql
row_number() over (partition by name order by 1) as rownumber

namerownumber
andrew1
andrew2
andrew3
berty1
berty2
emily1
emily2
emily3
edward1
edward2
edward3
edward4
hugo1


Talend TOS

In Talend Open Studio, create a job with the following components:
tFixedFlowInput > tMap > tLogRow
“>” is a “Row > Main” flow.
(I’ve renamed my tFixedFlowInput component to “table_names”)


tFixedFlowInput

I configure this component to
  • schema is just 1 column “name” of "String” datatype
  • use the inline content below
    andrew
    andrew
    andrew
    berty
    berty
    emily
    emily
    emily
    edward
    edward
    edward
    edward
    hugo


tMap

In this component I declare an output “out1” with columns
  • name, with String type
  • rownumber, with Integer type

Note that I use an expression to populate the “rownumber” column.
Numeric.sequence(row1.name,1,1)

This expression can be explained as:
  • Numeric.sequence : this defines a integer sequence
  • row1.name : this is the column that is “partition by” and “order by” when compared to the tsql windows function
  • 1 : the first “1” is the starting point for the sequence
  • 1 : the second “1” is the sequence increment


tLogRow

Finally, the results are returned back to the execution console for review, using the tLogRow component.


The Results

Below is a screenshot of the final execution results.

Tuesday, 1 October 2019

Aginity Pro and Aginity Team

Post seminar questions and answers:

Links:


Hi Fai,

This is George, one of Aginity’s product managers. I wanted to thank you once again for joining our webinar on September 10th.

You had a number of good questions during the webinar that I wanted to make sure I addressed, in case we hadn’t previously.

First, you asked about the ability to look at the distribution of a table in Redshift. I wanted to let you know that we’ve heard that feedback before and our goal is to add that capability soon. We recognize how useful it is.

You also asked for the ability to get a quick look at table sizes for Redshift. This is another request we’ve heard, and we’ve added it to our queue for consideration.

You had a suggestion about raising an alert or warning about a DELETE issued without a WHERE clause – this is an interesting idea, and one that I can raise with our broader team. I can see this being a useful option for individuals; we’d want to make sure we constructed it correctly so that we don’t interrupt workflows among individuals who intend to do this sort of thing regularly.

You asked about limited the return of a query that could result in thousands or millions of rows. We do have in Aginity Pro a configurable limit and we will return no more than that number of rows for any given query.

You asked about where the result data are stored after a query result, in local memory or in a local temporary file. We use memory inside the application to store the results and release the memory when results are no longer needed. Depending on how memory is configured for any user’s particular system, this may include “virtual memory” that is stored on disk.

You asked about the contents of the log in Aginity Pro/Team as well as the Query History. We have an extensive log in Aginity Pro and Team, but we separately maintain a query history for each individual user that is accessible using the Discovery Assistant.

Finally, you asked about the display of JSON results. We don’t currently automatically format such JSON to be “pretty-printable”, but that’s a suggestion I can bring to our team and evaluate.

Thanks Fai for being such an active member of the Aginity community!

Best

George L’Heureux

George C. L'Heureux, Jr. »  Enterprise Product Manager & Director of Solution Architecture »  Aginity
twitter @aginity