As a beginner, maybe you are having some problems using SQL, we are going to make a little tutorial here about the most common mistakes we made when we are coding and try to learn a bit more about this beautiful language called SQL.
1. Never forget semi-colons.
At first, in most SQL interpreters, your code will work without them but in stored procedures or complex queries, using semi-colons is a must.
For instance, let’s say you need to get the users in a table and their addresses and you also need to count them.
These lines will work in almost every interpreter:
SELECT COUNT(*) FROM table_users
SELECT users, addresses FROM table_users
Easy, right? But what if I tell you you could have done the same thing with only one query block? (And you should get used to doing it this way because when you write stored procedures or anything else, there isn’t any other suitable way of doing it)
SELECT COUNT(*) FROM table_users;
SELECT users, addresses FROM table_users;
This will give you two results (In most SQL interpreters they will be in separate tabs).
2. Using spaces in AS statement (And not double quotes)
In most SQL interpreters, using spaces in AS statement will generally lead to an error, UNLESS you use double quotes to define your AS statement.
For instance, this line would be wrong:
SELECT users AS active users FROM table_users where active=1;
This will certainly won’t work, one easy way to solve it would be using an _ between the words active and users
SELECT users AS active_users FROM table_users where active=1;
But another workaround would be using double quotes in order to name the field
SELECT users AS "active users" FROM table_users where active=1;
3. Using double quotes when you need single quotes
Double quotes and single quotes are for different things in SQL, using single quotes would be for strings or other SQL variables, and double quotes, as we stated before, would be for AS statements with special characters.
SELECT 'Hello World' AS predefined_text
As always, it will depend on your SQL interpreter, but this line will output a variable saying ‘Hello World’ in a column called predefined_text.
But the following line will display a column called Hello World with Hello world output as the only line.
SELECT "HELLO WORLD"
4. Using WHERE clause with aggregate functions could save your life
Using WHERE clause with aggregate functions is one of the optimal ways to navigate a table, the problem here is that if a database is too big, will take ages to load your query (If it can even do it). This is how you should navigate extensive databases tables:
For instance, if you have a database with all user actions separated by date and you need the last-day user actions, your query should be something like this:
SELECT * FROM user_actions WHERE date_action=(SELECT MAX(date_action) FROM user_actions)
5. Aggregate functions go hand by hand with GROUP BY clause
Aggregate functions go hand by hand with GROUP BY clause UNLESS the only thing you need is the aggregate function result.
This query will GROUP BY average points by user
SELECT username, AVG(points) FROM users_table GROUP BY 1
We are saying GROUP BY 1 as username is the first data we need from the table, we could also use GROUP BY username in this case.
This query, instead, will bring the average points of all the users of the tables:
SELECT AVG(points) FROM user_table
6. Every SQL interpreter has its peculiarities
Every SQL interpreter has its peculiarities, one query could work in a SQL environment and not in another one.
For instance, QUALIFY clause to filter results are Teradata-specific (Very useful indeed) but to make the same query in OracleDB you should use something different:
# TERADATA SPECIFIC QUERY:
SELECT * FROM table_user_points_actions QUALIFY ROW_NUMBER OVER (PARTITION BY username ORDER BY POINTS)=1
# ORACLE DB EQUIVALENT:
SELECT T.* FROM
(SELECT
RANK() OVER (PARTITION BY username ORDER BY POINTS) my_rank, A.* FROM table_user_points_actions A) T
WHERE T.my_rank=1
Let’s dive into this a bit. Both queries will get all the usernames and the time they make more points, in Teradata we will use QUALIFY to order the values and will get the first one of them (That would be the =1){
In oracle, we should RANK each value by username and points and then get the first one of them.
7. CASE WHEN… ELSE… END – A good ally
The CASE clause is a good ally to understanding when a condition in your query is met.
SELECT username, CASE WHEN SUBSTRING(username,1,2)='A' THEN 'Username starts with A'
WHEN SUBSTRING(username,1,2)='B' THEN 'Username starts with B'
ELSE 'Username doesn't start with A or B' END as username_first_letter_a_or_b_or_else FROM user_tables
This query will tell us if the username starts with A or B, case else, it will tell us it doesn’t start with any of them both.
8. Using subqueries is ALWAYS a good idea
Using subqueries is ALWAYS a good idea, not just because it will execute your code in parts letting you an easier debugging work but also because you sometimes need to know one result in order to get another result (Almost always when you use aggregate functions).
SELECT T.username,
CASE WHEN T.sum_points > 100 THEN 'User have win' ELSE 'User didn't win' END as USER_WIN
FROM (SELECT Username, SUM(points) as sum_points FROM table_user_points_actions GROUP BY 1) T
This will sum up how many points the user has accumulated, if it has been more than 100, will say that the user won, case else that the user didn’t win.
9. Not all join statements work the same way.
Not all join statement work the same way.
For instance, a LEFT JOIN will use the first table as the base table and add the data of the second table to it (And repeat if it is found more than once). Nevertheless, a FULL JOIN will put the two tables together.
# This query will match the first table and the second table, but will not show when there is a username in table_user_points_actions that is not in table_username
SELECT * FROM table_username A
LEFT JOIN table_user_points_actions B
ON A.username=B.username
# This query will match the first table and the second table, but WILL show the registers left "abandoned" by the first table
SELECT * FROM table_username A
FULL JOIN table_user_points_actions B
ON A.username=B.username
10. In doubt, separate your query statement
One good way of debugging your query statement is separate it into smaller statements.
SELECT A.*, SUM(POINTS) FROM table_username A GROUP BY 1,2,3,4,5,6; --Doesn't work
SELECT A.* FROM TABLE_USERNAME A GROUP BY 1,2,3,4,5,6; -- Also doesn't work because there are 5 values on the table and I am grouping by 6
SELECT SUM(POINTS) FROM table_username A; -- Does work
Corrected query:
SELECT A.*, SUM(POINTS) FROM table_username A GROUP BY 1,2,3,4,5,6; -- Does work
Conclusion
SQL is a beautiful and special programming language that you need to know to be a good data analyst, hope these tips help you in the present or future!