How to use a QUALIFY clause in SQL?

Qualify may be one of the most underestimate commands in Teradata SQL, as a Data Analyst, I use it a lot and I really hope after this tutorial you use it too.

What is a Qualify clause?

A Qualify clause is the last filter to your SELECT statement. Which will qualify your rows by an argument of your choice. This means that if a SELECT statement gives you multiple rows and you only want one of them, you can filter it and just display the element that you need.

Where can I use a Qualify clause?

You can use a Qualify clause in any query or any subquery, which means you can filter before or after your SELECT statement do a JOIN and filter the results of a subset of the information you need.

It is actually one of the most useful SQL commands for data analysis and, in my opinion, it’s not as used as it should be.

How to do a Qualify in a Query?

Let’s say you have two tables, the first one has ten rows that contain a primary key, a username, and an email (This would be ID, User, Email).

The second one has one million rows that contain the actions that the user made, using the primary key of the first one as a foreign key and a DateTime data field that tells you when the user did the action (This table would have a primary key called ID, ID_User as a foreign key, date_field as DateTime field and action as for instance, a string). In this second table, data is not necessarily ordered and de primary key is not necessarily an incremental number.

You just need the last action this user made. How would you do it?

Using Qualify with ROW_NUMBER()

One way of achieving this would be using ROW_NUMBER() which assign a number to every row. The code would look like this:

SELECT * FROM first_table A
LEFT JOIN second_table B
ON A.ID = B.ID_User
QUALIFY ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY B.date_field DESC) = 1

Here you can see an “= 1”, but what does this mean?

This mean that ROW_NUMBER() will assign a number to each row ordered from the last one to the first one and the QUALIFY will get the first one of them.

Now let’s say that instead of the last action, you want the first five actions the user did.

SELECT * FROM first_table A
LEFT JOIN second_table B
ON A.ID = B.ID_User
QUALIFY ROW_NUMBER() OVER (PARTITION BY A.ID ORDER BY B.date_field ASC) <= 5

We only have to change a bit the last part of code, right? Now we got just the first 5 actions the user did.

Using QUALIFY with ROW_NUMBER() in a subquery

We can also use this inside a subquery, let’s say that your second table now has a hundred million rows, and you just need the last action of the previous month. A query with all the data and then filtering would be memory exhausting (You can read about how to make a good query in another of our posts).

Your query in this case should look something like this:

SELECT * FROM first_table A
LEFT JOIN (SELECT * FROM second_table WHERE date_field BETWEEN date-30 and date-60
QUALIFY ROW_NUMBER() OVER (PARTITION BY ID ORDER BY B.date_field DESC) = 1) B
ON A.ID = B.ID_User

Now you have a subquery with a QUALIFY inside it that will decrease your memory usage and will take less time to execute!

Other ways of using QUALIFY

Even though Qualify is most used with ROW_NUMBER(), it can be used with other options, such as RANK or QUANTILE.

Let’s continue with our example and say now you want the ten persons who have done fewer actions

SELECT A.ID, B.Qactions, RANK() OVER (ORDER BY B.Qactions ASC) FROM first_table A
LEFT JOIN (SELECT ID_User, count(*) as Qactions FROM second_table GROUP BY 1) B
ON A.ID = B.ID_User
QUALIFY RANK() OVER (ORDER BY B.Qactions DESC) <=10;

Here we are using RANK to establish the number of the row, RANK will put the rows in order by some parameter, in this case, it will put the rows in order by the quantity of actions.

And last, let’s say you want the 10% of people that have done fewer actions

SELECT A.ID, B.Qactions, QUANTILE(10, B.Qactions) AS percentile
 FROM first_table A
LEFT JOIN (SELECT ID_User, count(*) as Qactions FROM second_table GROUP BY 1) B
ON A.ID = B.ID_User
QUALIFY percentile = 1;

In this last query, we divide the people into 10 quantiles by their Quantity of actions, so the first one of the 10 quantiles would be less than 10% (The second one would be 20%, etc.)

Conclusion

QUALIFY is a powerful clause, that you should know about and use it for analytics purposes. It has saved my life multiple times and I really hope you find this tutorial helpful and use it as much as I do in the future!

Leave a Reply