What is an aggregate function?

An aggregate function where the values are grouped together to form a single response. The more common SQL aggregate functions are SUM, COUNT, AVG, MIN and MAX.

How to use aggregate functions?

Aggregate functions can be grouped or simply the only value, for instance, let’s say we have a table with names and we want to count how many names repeat. We could do something like this:

SELECT name, COUNT(*) from table_with_names group by name;

This will group our results by name and count them, meaning that if the COUNT gives us more than 1, then there would be a repeated name.

Using the HAVING clause

As we saw before, we can group names and count how many of them there are in a table, but what if we only want the names that repeat in more than 5 rows?

We could use having to only get this results. Query would be:

SELECT name, COUNT(*) from table_with_names group by name
HAVING COUNT(name)>5;

Aggregate functions in WHERE clause

We can’t simply use aggregate functions in WHERE clause, this would give us an error:

SELECT name, COUNT(*) as COUNT_A from table_with_names WHERE COUNT_A>5;

As we saw before, this should be in a HAVING clause.

However, we could use a subquery with an Aggregate function in a WHERE clause.

Let’s say we only want the results above average in a table that have numbers:

SELECT number FROM table_with_numbers WHERE number>(SELECT AVG(number) FROM table_with_numbers);

This query would be correct and give use only the numbers above average, although AVG is also an aggregate function.

Using more than one Aggregate Function

You can set more than one aggregate function in a query, let’s say you have a table with name, last name and numbers associated with them and you want the sum of the numbers associated with each name and last name and the average, but only from the people that have above average:

SELECT name, lastname, SUM(number), AVG(number) FROM table_with_names_lastnames_numbers WHERE number>(SELECT AVG(number) FROM table_with_numbers)
GROUP BY 1,2;

As you may have seen with these examples, you HAVE to group by every other column that is not an aggregate function. In the only case, you do not need to group by is when your query will only return the aggregate function.

Using Aggregate functions for Statistical analysis.

Aggregate functions are very useful for statistical analysis, we could use STDDEV/STDEV mostly available in every SQL interpreter to get the standard deviation of a dataset:

SELECT MAX(number), MIN(number), AVG(number), STDEV(number) FROM [Test].[dbo].[test]

This could be great if you are doing some basic machine learning models or need to find thresholds of abnormal situations.

Using other Aggregate functions

Although COUNT, MAX, MIN, SUM, and AVG are pretty standard aggregate functions, every database and SQL derived language will have their own, you can see a list of the aggregate functions available in SQL Server here and a list of the available aggregate functions in OracleDB here.

Conclusion

Aggregate functions used the correct way are useful and powerful functions to analyze our data, make dashboards, generate reports and get statistical information.

As a Data Analyst myself, some of my best and more important queries use them to get useful information, mostly when you are dealing with Big Data.

If you are a data analyst too, maybe this post will be useful for you too.

See you at the next one!

Leave a Reply