6. Sorting Results

Iqra Anwar
13 min readApr 4, 2023

--

4/4 Intermediate SQL

1. Sorting results

Congratulations on making it this far! We’ll now learn how to sort and group results to gain further insight.

SQL sorting

2. Sorting results

Sorting results means we want to put our data in a specific order. It’s another way to make our data easier to understand by quickly seeing it in a sequence.

Let’s say we wanted to extract our three longest coats; if our closet were messy, it would take a long time to find. However, if we sorted our closet by garment type and length, we could quickly grab them!

3. ORDER BY

In SQL, the ORDER BY keyword is used to sort results of one or more fields. When used on its own, it is written after the FROM statement, as shown here.

ORDER BY will sort in ascending order by default. This can mean from smallest to biggest or from A to Z. In this case, we have one query sorting the budget from smallest to biggest and a second query sorting the titles alphabetically.

Our database contains film titles that start with symbols and numbers; these come before the letter A.

4. ASCending

We could also add the ASC keyword to our query to clarify that we are sorting in ascending order. The results are the same, and our code is more readable.

5. DESCending

We can use the DESC keyword to sort the results in descending order. This query gives us the film titles sorted by budget from biggest to smallest.

However, our data contains a lot of null values. We can add a WHERE clause before ORDER BY to filter the budget field for only non-null values and improve our results.

6. Sorting fields

Notice that we don’t have to select the field we are sorting on. For example, here’s a query where we sort by release year and only look at the title.

However, it is a good idea to include the field we are sorting on in the SELECT statement for clarity.

7. ORDER BY multiple fields

ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, etc.

To specify multiple fields, we separate the field names with a comma. The second field we sort by can be thought of as a tie-breaker when the first field is not decisive in telling the order.

Here is an example. Let’s say we wanted to find the best movie. In the first query, we are only sorting the films by the number of Oscar wins and getting a tie.

We can break that tie by adding a second sorting field by seeing which film has the most wins and the highest imdb_score.

8. Different orders

We can also select a different order for each field we are sorting. For example, here, we are sorting birthdate in ascending order and name in descending order.

9. Order of execution

ORDER BY falls towards the end of the order of execution we already know, coming in just before limit.

The FROM statement will execute first,

then WHERE, followed by SELECT, ORDER BY,

and finally, LIMIT.

10. Let’s practice!

Time to practice our new sorting skills!

Sorting text

SQL provides you with the ORDER BY keyword to sort one or more fields from your data. It can do this multi-directionally and helps make results easy to interpret.

How does ORDER BY sort a column of text values by default?

Sorting single fields

Now that you understand how ORDER BY works, you'll put it into practice. In this exercise, you'll work on sorting single fields only. This can be helpful to extract quick insights such as the top-grossing or top-scoring film.

The following exercises will help you gain further insights into the film database.

  1. Select the name of each person in the people table, sorted alphabetically.
SELECT name
FROM people
ORDER BY name ASC;

2. Select the title and duration for every film, from longest duration to shortest.

-- Select the title and duration from longest to shortest film
SELECT title, duration
FROM films
ORDER BY duration DESC;

Sorting multiple fields

ORDER BY can also be used to sort on multiple fields. It will sort by the first field specified, then sort by the next, and so on. As an example, you may want to sort the people data by age and keep the names in alphabetical order.

Try using ORDER BY to sort multiple columns.

  1. Select the release_year, duration, and title of films ordered by their release year and duration, in that order.
SELECT release_year, duration, title
FROM films
ORDER by release_year, duration;

2. Select the certification, release_year, and title from films ordered first by certification (alphabetically) and second by release year, starting with the most recent year.

SELECT certification, release_year, title
FROM films
ORDER BY certification, release_year DESC;

. Grouping data

We’ve learned how to sort data. Next, we’ll look at grouping our results.

2. Grouping data

In the real world, we’ll often need to summarize data for a particular group of results. For example, we might want to see the film data grouped by certification and make calculations on those groups, such as the average duration for each certification.

3. GROUP BY single fields

SQL allows us to group with the GROUP BY clause. Here it is used in a query where we have grouped by certification.

GROUP BY is commonly used with aggregate functions to provide summary statistics, particularly when only grouping a single field, certification, and selecting multiple fields, certification and title. This is because the aggregate function will reduce the non-grouped field to one record only, which will need to correspond to one group.

4. Error handling

SQL will return an error if we try to SELECT a field that is not in our GROUP BY clause. We’ll need to correct this by adding an aggregate function around title.

5. GROUP BY multiple fields

We can use GROUP BY on multiple fields similar to ORDER BY. The order in which we write the fields affects how the data is grouped. The query here selects and groups certification and language while aggregating the title.

The result shows that we have five films that have missing values for both certification and language, two films that are unrated and in Japanese, two films that are rated R and in Norwegian, and so on.

6. GROUP BY with ORDER BY

We can combine GROUP BY with ORDER BY to group our results, make a calculation, and then order our results. For example, we can clean up one of our previous queries by sorting the results by the title count in descending order. Here is that query without ORDER BY, and this is the same query with ordering added.

ORDER BY is always written after GROUP BY, and notice that we can refer back to the alias within the query. That is because of the order of execution. It looks like movies rated R are the most common in our database.

7. Order of execution

GROUP BY fits into our order after FROM and before all other clauses. Our updated queries will begin with

FROM, followed by grouping, selecting the data and creating the alias, sorting the results, and limiting them to the desired number.

8. Let’s practice!

In the following exercises, we’ll examine our film database to find out about release year, review, and budget patterns. Let’s practice!

GROUP BY single fields

GROUP BY is a SQL keyword that allows you to group and summarize results with the additional use of aggregate functions. For example, films can be grouped by the certification and language before counting the film titles in each group. This allows you to see how many films had a particular certification and language grouping.

In the following steps, you’ll summarize other groups of films to learn more about the films in your database.

  1. Select the release_year and count of films released in each year aliased as film_count.
SELECT release_year, COUNT(*) AS film_count
FROM films
GROUP BY releas_year;

2. Select the release_year and average duration aliased as avg_duration of all films, grouped by release_year.

SELECT release_year, AVG(duration) AS avg_duration
FROM films
GROUP BY release_year;

GROUP BY multiple fields

GROUP BY becomes more powerful when used across multiple fields or combined with ORDER BY and LIMIT.

Perhaps you’re interested in learning about budget changes throughout the years in individual countries. You’ll use grouping in this exercise to look at the maximum budget for each country in each year there is data available.

  1. Select the release_year, country, and the maximum budget aliased as max_budget for each year and each country; sort your results by release_year and country.
SELECT release_year, country, MAX(budget) AS max_budget
FROM films
GROUP BY release_year, country;

Answering business questions

In the real world, every SQL query starts with a business question. Then it is up to you to decide how to write the query that answers the question. Let’s try this out.

Which release_year had the most language diversity?

Take your time to translate this question into code. We’ll get you started then, it’s up to you to test your queries in the console.

“Most language diversity” can be interpreted as COUNT(DISTINCT ___). Now over to you.

SELECT release_year, COUNT(DISTINCT language)
FROM films
GROUP BY release_year
ORDER BY language;

1. Filtering grouped data

That was excellent work. We’ve combined sorting and grouping; next, we will combine filtering with grouping.

2. HAVING

In SQL, we can’t filter aggregate functions with WHERE clauses. For example, this query attempting to filter the title count is invalid.

That means that if we want to filter based on the result of an aggregate function, we need another way. Groups have their own special filtering word: HAVING.

For example, this query shows only those years in which more than ten films were released.

3. Order of execution

The reason why groups have their own keyword for filtering comes down to the order of execution. We’ve written a query using many of the keywords we have covered here.

This is their written order, starting with SELECT, FROM films, WHERE the certification is G, PG, or PG-13, GROUP BY certification, HAVING the title count be greater than 500, ORDER BY title count, and LIMIT to three. In contrast, the order of execution is:

FROM,

WHERE,

GROUP BY,

HAVING,

SELECT,

ORDER BY,

and LIMIT.

By reviewing this order, we can see WHERE is executed before GROUP BY and before any aggregation occurs. This order is also why we cannot use the alias with HAVING, but we can with ORDER BY.

4. HAVING vs WHERE

WHERE filters individual records while HAVING filters grouped records. We’ll walk through two business questions here to show how to translate them into the correct filter.

The first question is “What films were released in the year 2000?”. This question does not indicate any sort of grouping.

It asks to see only the titles from a specific year and can therefore be written as SELECT title, FROM films, WHERE release year equals 2000.

The second question is, “In what years was the average film duration over two hours?”. Straight away, we can see this question has a few more layers. Let’s break down the question and query into smaller, easier-to-understand steps.

5. HAVING vs WHERE

This question requires us to return information about years, so we select the release year from the films table. Next, it asks for the average film duration, which tells us we need to place AVG(duration) somewhere. Since we do not need to provide any additional information around the duration on its own, it is unlikely we need to perform the aggregation within the SELECT clause, so we’ll try the HAVING clause instead. The last part of the question indicates we need to filter on the duration. Since we can’t filter aggregates with WHERE, this supports our theory about using HAVING! Finally, we need to add a GROUP BY into our query since we have selected a column that has not been aggregated. Recall the aggregate function will convert the duration values into one average value. Going back to the start of our question, we’re interested in knowing the average duration per year, so we group it by release year. And there we have it!

6. Let’s practice!

Let’s improve our confidence with a final round of practice.

Filter with HAVING

Your final keyword is HAVING. It works similarly to WHERE in that it is a filtering clause, with the difference that HAVING filters grouped data.

Filtering grouped data can be especially handy when working with a large dataset. When working with thousands or even millions of rows, HAVING will allow you to filter for just the group of data you want, such as films over two hours in length!

Practice using HAVING to find out which countries (or country) have the most varied film certifications.

  1. Select country from the films table, and get the distinct count of certification aliased as certification_count.
SELECT country, COUNT(DISTINCT certification) AS certification_count
FROM films

2. Group the results by country.

SELECT country, COUNT(DISTINCT certification) AS certification_count
FROM films
GROUP BY country

3. Filter the unique count of certifications to only results greater than 10.

SELECT country, COUNT(DISTINCT certification) AS certification_count
FROM films
GROUP BY country
HAVING COUNT(DISTINCT certification) > 10;

HAVING and sorting

Filtering and sorting go hand in hand and gives you greater interpretability by ordering our results.

Let’s see this magic at work by writing a query showing what countries have the highest average film budgets.

  1. Select the country and the average budget as average_budget, rounded to two decimal, from films.
SELECT country, ROUND(AVG(budget),2) AS average_budget
FROM films
  • Group the results by country.
GROUP BY country
  • Filter the results to countries with an average budget of more than one billion (1000000000).
HAVING AVG(budget) > 1000000000
  • Sort by descending order of the average_budget.
SELECT country, ROUND(AVG(budget),2) AS average_budget
FROM films
GROUP BY country
HAVING AVG(budget) > 1000000000
SORT BY average_budget DESC;

All together now

It’s time to use much of what you’ve learned in one query! This is good preparation for using SQL in the real world where you’ll often be asked to write more complex queries since some of the basic queries can be answered by playing around in spreadsheet applications.

In this exercise, you’ll write a query that returns the average budget and gross earnings for films each year after 1990 if the average budget is greater than 60 million.

This will be a big query, but you can handle it!

  • Select the release_year for each film in the films table, filter for records released after 1990, and group by release_year.
SELECT release_year 
FROM films
WHERE release_year > 1990
GROUP BY release_year;
  • Modify the query to include the average budget aliased as avg_budget and average gross aliased as avg_gross for the results we have so far.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year;
  • Modify the query once more so that only years with an average budget of greater than 60 million are included.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year

HAVING AVG(budget) > 60000000;
  • Finally, order the results from the highest average gross and limit to one.
SELECT release_year, AVG(budget) AS avg_budget, AVG(gross) AS avg_gross
FROM films
WHERE release_year > 1990
GROUP BY release_year

HAVING AVG(budget) > 60000000;
ORDER BY avg_gross DESC
LIMIT 1;

This is what we have learned so far!!

Congratulations!

Best of luck, and remember making mistakes is all part of learning!

--

--

Iqra Anwar

Data Engineer @ RC! BSCS graduate from BBSUL Karachi, Pakistan. Ex Technical writer at Omdena Pakistan Chapter | I write articles related to Computer Science