08-Outer Joins, Cross Joins and Self Joins
2/4 LEFT AND RIGHT JOINS
After familiarizing yourself with inner joins, you will come to grips with different kinds of outer joins. Next, you will learn about cross joins. Finally, you will learn about situations in which you might join a table with itself.
1. LEFT and RIGHT JOINs
Congratulations on completing Chapter 1 on inner joins, and welcome to Chapter 2 on outer joins, cross joins and self joins!
2. INNER JOIN diagram
Recall the INNER JOIN diagram from Chapter 1. The only records in the result were those where the id field had matching values in both tables.
3. LEFT JOIN initial diagram
We’ll now compare INNER JOIN with three types of outer joins, beginning with LEFT JOIN.
Outer joins can obtain records from other tables, even if matches are not found for the field being joined on. A LEFT JOIN will return all records in the left_table and those records in the right_table that match on the joining field provided. In the diagram shown, the values of 2 and 3 do not appear in the id field of right_table but will still be retained in the join. Records that are not of interest to a LEFT JOIN on the id field have been faded out.
4. LEFT JOIN diagram
We now look at the result of the LEFT JOIN on id. INNER JOIN returns only records corresponding to ids 1 and 4, whereas LEFT JOIN keeps all records in left_table, as well as null values for right_val where is no match in right_table.
Note that ids 5 and 6 in right_table do not feature in LEFT JOIN in any way.
5. LEFT JOIN syntax
Let’s go back to our world leaders example. Say we want our query to include all countries with prime ministers, presidents if they happen to have them, and missing values if they don’t. LEFT JOIN will give us the results we need! The syntax of LEFT JOIN is very similar to INNER JOIN. Only the word INNER is replaced with LEFT.
Note that LEFT JOIN can also be written as LEFT OUTER JOIN in SQL. The first three records in the result are the same as they were with an INNER JOIN, but from the fourth record the result starts to look different. Since the United Kingdom does not have a president, a corresponding null value is returned in the president field.
6. RIGHT JOIN
On to RIGHT JOIN! RIGHT JOIN is the second type of outer join, and is much less common than LEFT JOIN so we won’t spend as much time on it here. Instead of matching entries in the id column of the left table to the id column of the right table, a RIGHT JOIN does the reverse. All records are retained from right_table, even when id doesn’t find a corresponding match in left_table. Null values are returned for the left_value field in records that do not find a match.
7. RIGHT JOIN syntax
Generic syntax for a RIGHT JOIN is shown. Note that the order of left_table and right_table is the same as in LEFT JOIN. The only change is that we call RIGHT JOIN instead of LEFT JOIN. RIGHT JOIN can also be written as RIGHT OUTER JOIN in SQL.
8. RIGHT JOIN with presidents and prime ministers
Let’s make this concrete with our world leaders example. We perform a right join of prime_ministers on the left and presidents on the right. The only change is from the LEFT JOIN keyword to RIGHT JOIN.
The result contains null values where countries have presidents but no prime ministers.
9. LEFT JOIN or RIGHT JOIN?
Now that you’re familiar with both LEFT and RIGHT JOIN, let’s discuss why RIGHT JOIN is less commonly used.
A key reason for this is that a RIGHT JOIN can always be re-written as a LEFT JOIN.
Because we typically type from left to right, LEFT JOIN feels more intuitive to most users when constructing queries.
10. Let’s practice!
Nice work on navigating some of the most popular types of joins! Let’s practice.
Remembering what is LEFT
To become faster at writing queries, it’s helpful to memorize their structure. In this exercise, you will reconstruct the order of the steps of LEFT JOIN
from memory!
This is a LEFT JOIN, right?
Nice work getting to grips with the structure of joins! In this exercise, you’ll explore the differences between INNER JOIN
and LEFT JOIN
. This will help you decide which type of join to use.
As before, you will be using the cities
and countries
tables.
You’ll begin with an INNER JOIN
with the cities
table (left) and countries
table (right). This helps if you are interested only in records where a country is present in both tables.
You’ll then change to a LEFT JOIN
. This helps if you're interested in returning all countries in the cities
table, whether or not they have a match in the countries
table.
Perform an inner join with
cities AS c1
on the left andcountries as c2
on the right.Use
code
as the field to merge your tables on.
SELECT
c1.name AS city,
code,
c2.name AS country,
region,
city_proper_pop
FROM countries AS c2
INNER JOIN cities AS c1
ON c1.country_code = c2.code
Change the code to perform a
LEFT JOIN
instead of anINNER JOIN
.After executing this query, have a look at how many records the query result contains.
SELECT
c1.name AS city,
code,
c2.name AS country,
region,
city_proper_pop
FROM countries AS c2
LEFT JOIN cities AS c1
ON c1.country_code = c2.code
ORDER BY code DESC;
Building on your LEFT JOIN
You’ll now revisit the use of the AVG()
function introduced in a previous course.
Being able to build more than one SQL function into your query will enable you to write compact, supercharged queries.
You will use AVG()
in combination with a LEFT JOIN
to determine the average gross domestic product (GDP) per capita by region in 2010.
Complete the
LEFT JOIN
with thecountries
table on the left and theeconomies
table on the right on thecode
field.Filter the records from the
year
2010.
SELECT name, region, gdp_percapita
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010;
To calculate per capita GDP per region, begin by grouping by
region
.After your
GROUP BY
, chooseregion
in yourSELECT
statement, followed by average GDP per capita using theAVG()
function, withAS avg_gdp
as your alias.
SELECT region, AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010
GROUP BY region;
Order the result set by the average GDP per capita from highest to lowest.
Return only the first 10 records in your result.
SELECT region, AVG(gdp_percapita) AS avg_gdp
FROM countries AS c
LEFT JOIN economies AS e
USING(code)
WHERE year = 2010
GROUP BY region
ORDER BY avg_gdp DESC
LIMIT 10;
Is this, RIGHT?
You learned that right joins are not used as commonly as left joins. A key reason for this is that right joins can always be re-written as left joins, and because joins are typically typed from left to right, joining from the left feels more intuitive when constructing queries.
It can be tricky to wrap one’s head around when left and right joins return equivalent results. You’ll explore this in this exercise!
SELECT countries.name AS country, languages.name AS language, percent
FROM countries
LEFT JOIN languages
USING(code)
ORDER BY language;
Write a new query using
RIGHT JOIN
that produces an identical result to theLEFT JOIN
provided.
1. FULL JOINs
In this video, we’ll introduce FULL JOIN.
FULL JOIN is the last of the three types of outer join. We’ll compare FULL JOIN to the two other types of outer join we have learned (LEFT JOIN and RIGHT JOIN), as well as to INNER JOIN.
2. FULL JOIN initial diagram
A FULL JOIN combines a LEFT JOIN and a RIGHT JOIN. As you can see in this diagram, no values are faded out as they were in earlier diagrams. This is because the FULL JOIN will return all ids, irrespective of whether they have a match in the other table being joined.
3. FULL JOIN diagram
Let’s have a look at the result after FULL JOIN.
We see that it has retained all ids, returning missing values in the form of nulls for fields corresponding to records where id did not find a match. All six ids we have been working with are included in the result after the FULL JOIN. Note that this time, nulls can appear in either left_value or right_value fields.
4. FULL JOIN syntax
Time for some SQL code. In order to produce a FULL JOIN, the general format aligns closely with the SQL syntax we’ve been using for INNER JOIN, LEFT JOIN and RIGHT JOIN. We adapt our join to have the word ‘FULL’ before ‘JOIN’. Note that the keyword FULL OUTER JOIN can also be used to return the same result.
5. FULL JOIN example using leaders database
Let’s make our sample syntax concrete with our example from the leaders database.
Suppose we were interested in all the countries in our database, and wanted to check whether they had a president, a prime minister, or both.
We’ll walk through the code line by line to do this using a FULL JOIN. The SELECT statement starts us off by including the country, as well as the prime_minister and president fields.
Next, we specify prime_ministers as our left table and alias this as p1.
Note that the order of the tables matters here, and if we switched the order, the records would be ordered differently depending on how prime ministers and presidents are ordered in the tables.
We then add the FULL JOIN query and add presidents as the right table, using alias p2.
Lastly, the join is performed using country as the field to join on in both tables. We LIMIT to the first 10 records.
Here is a look at our result.
Note that there are null values in both the prime_minister and president fields. We chose a FULL JOIN because we were interested in all countries, whether they had a prime minister, a president, or both.
10. Let’s practice!
Time for some practice!
Comparing joins
In this exercise, you’ll examine how results can differ when performing a full join compared to a left join and inner join by joining the countries
and currencies
tables. You'll be focusing on the North American region
and records where the name
of the country is missing.
You’ll begin with a full join with countries
on the left and currencies
on the right. Recall the workings of a full join with the diagram below!
Perform a full join with
countries
(left) andcurrencies
(right).Filter for the
North America
region
orNULL
country names.
SELECT name AS country, code, region, basic_unit
FROM countries
FULL JOIN currencies
USING (code)
WHERE region = 'North America' OR name IS NULL
ORDER BY region;
Repeat the same query as before, turning your full join into a left join with the
currencies
table.Have a look at what has changed in the output by comparing it to the full join result.
SELECT name AS country, code, region, basic_unit
FROM countries
LEFT JOIN currencies
USING (code)
WHERE region = 'North America' OR name IS NULL
ORDER BY region;
Repeat the same query again, this time performing an inner join of
countries
withcurrencies
.Have a look at what has changed in the output by comparing it to the full join and left join results!
SELECT name AS country, code, region, basic_unit
FROM countries
INNER JOIN currencies
USING (code)
WHERE region = 'North America' OR name IS NULL
ORDER BY region;
Chaining FULL JOINs
As you have seen in the previous chapter on INNER JOIN
, it is possible to chain joins in SQL, such as when looking to connect data from more than two tables.
Suppose you are doing some research on Melanesia and Micronesia, and are interested in pulling information about languages and currencies into the data we see for these regions in the countries
table. Since languages and currencies exist in separate tables, this will require two consecutive full joins involving the countries
, languages
and currencies
tables.
Complete the
FULL JOIN
withcountries as c1
on the left andlanguages as l
on the right, usingcode
to perform this join.Next, chain this join with another
FULL JOIN
, placingcurrencies
on the right, joining oncode
again.
SELECT
c1.name AS country,
region,
l.name AS language,
basic_unit,
frac_unit
FROM countries as c1
FULL JOIN languages AS l
USING(code)
FULL JOIN currencies AS c2
USING(code)
WHERE region LIKE 'M%esia';
Crossing into CROSS JOIN
Now that we’ve worked with INNER JOINs and three types of outer joins, we will be checking out CROSS JOIN.
1. CROSS JOIN diagram
CROSS JOINs are slightly different than joins we have seen previously: they create all possible combinations of two tables.
Let’s explore the diagram for a CROSS JOIN.
In this diagram, we have two tables named table1 and table2, with one field each: id1 and id2, respectively. The result of the CROSS JOIN is all nine combinations of the id values of 1, 2, and 3 in table1 with the id values of A, B, and C for table2.
2. CROSS JOIN syntax
Let’s have a look at the syntax for CROSS JOIN.
Note that the syntax is very minimal, and we do not specify ON or USING with CROSS JOIN.
3. Pairing prime ministers with presidents
Let’s apply the SQL syntax for a CROSS JOIN to the following problem that uses our database of global leaders.
Suppose that all prime ministers in Asia from our database are scheduled for individual meetings with all presidents in South America from our database, and we are journalists who wish to follow all the meetings that will happen.
We can create a query that returns all these combinations using a CROSS JOIN!
We use a WHERE clause to focus only on prime ministers from Asia and presidents from South America. The results of the query give us all possible pairings of the four prime ministers from Asia in the prime_ministers table, and the two presidents from South America in the presidents table.
4. Let’s practice!
We’ll now dive into some exercises. Good luck!
Histories and languages
Well done getting to know all about CROSS JOIN
! As you have learned, CROSS JOIN
can be incredibly helpful when asking questions that involve looking at all possible combinations or pairings between two sets of data.
Imagine you are a researcher interested in the languages spoken in two countries: Pakistan and India. You are interested in asking:
- What are the languages presently spoken in the two countries?
- Given the shared history between the two countries, what languages could potentially have been spoken in either country over the course of their history?
In this exercise, we will explore how INNER JOIN
and CROSS JOIN
can help us answer these two questions, respectively.
Complete the code to perform an
INNER JOIN
ofcountries AS c
withlanguages AS l
using thecode
field to obtain the languages currently spoken in the two countries.
SELECT c.name AS country, l.name AS language
-- Inner join countries as c with languages as l on code
FROM countries AS c
INNER JOIN languages AS l
USING(code)
WHERE c.code IN ('PAK','IND')
AND l.code in ('PAK','IND');
Change your
INNER JOIN
to a different kind of join to look at possible combinations of languages that could have been spoken in the two countries given their history.Observe the differences in output for both joins.
SELECT c.name AS country, l.name AS language
FROM countries AS c
-- Perform a cross join to languages (alias as l)
CROSS JOIN languages AS l
WHERE c.code in ('PAK','IND')
AND l.code in ('PAK','IND');
Choosing your join
Now that you’re fully equipped to use joins, try a challenge problem to test your knowledge!
You will determine the names of the five countries and their respective regions with the lowest life expectancy for the year 2010. Use your knowledge about joins, filtering, sorting and limiting to create this list!
Complete the join of
countries AS c
withpopulations as p
.Filter on the year 2010.
Sort your results by life expectancy in ascending order.
Limit the result to five countries.
SELECT
c.name AS country,
region,
life_expectancy AS life_exp
FROM countries AS c
-- Join to populations (alias as p) using an appropriate join
LEFT JOIN populations AS p
ON c.code = p.country_code
-- Filter for only results in the year 2010
WHERE p.year = 2010
-- Sort by life_exp
ORDER BY life_exp ASC
-- Limit to five records
LIMIT 5;
1. Self joins
Well done getting to the last lesson of this chapter! We’ll now dive into a special kind of join, where a table is joined with itself. These types of joins are called self joins.
Joining a table to itself may seem like an unusual thing to do. Why would we want to do that? Self joins are used to compare values from part of a table to other values from within the same table.
Recall the prime_ministers table from earlier. Suppose all prime ministers are convening in summits on their own continents. We want to create a new table showing all countries in the same continent as pairs.
2. Prime minister, meet prime minister
Self joins don’t have dedicated syntax as other joins we have seen do. We can’t just write SELF JOIN in SQL code, for example.
In addition, aliasing is required for a self join. Let’s look at a chunk of INNER JOIN code using the prime_ministers table.
The country column is selected twice, and so is the continent column.
The prime_ministers table is on both the left and the right of the JOIN, making this both a self join and an INNER JOIN! The vital step here is setting the joining fields which we use to match the table to itself.
For each country, we will find multiple matched countries in the right table, since we are joining on continent. Each of these matched countries will be returned as pairs. Since this query will return several records, we use LIMIT to return only the first 10 records.
4. Prime minister, meet prime minister
The results are a pairing of each country with every other country in the same continent.
However, note that our join also paired countries with themselves, since they too are in the same continent as themselves. We don’t want to include these, since a leader from Portugal does not need to meet with themselves, for example. Let’s fix this.
5. Prime minister, meet prime minister
Recall the use of the AND clause to ensure multiple conditions are met in the ON clause. In our second condition, we use the not equal to operator to exclude records where the p1-dot-country and p2-dot-country fields are identical.
Here’s a look at our final table
showing combinations of countries in our database that are in the same continent, but excluding records where the two country fields are the same.
7. Let’s practice!
Nice work! You’ll now work on a couple of exercises for practice, then complete a joining challenge!
Comparing a country to itself
Self joins are very useful for comparing data from one part of a table with another part of the same table. Suppose you are interested in finding out how much the populations for each country changed from 2010 to 2015. You can visualize this change by performing a self join.
In this exercise, you’ll work to answer this question by joining the populations
table with itself. Recall that, with self joins, tables must be aliased. Use this as an opportunity to practice your aliasing!
Since you’ll be joining the populations
table to itself, you can alias populations
first as p1
and again as p2
. This is good practice whenever you are aliasing tables with the same first letter.
Perform an inner join of
populations
with itselfON
country_code
, aliasedp1
andp2
respectively.Select the
country_code
fromp1
and thesize
field from bothp1
andp2
, aliasingp1.size
assize2010
andp2.size
assize2015
(in that order).
SELECT p1.country_Code, p1.size AS size2010, p2.size AS size2015
FROM populations AS p1
INNER JOIN populations AS p2
ON p1.country_code = p2.country_code;
Since you want to compare records from 2010 and 2015, eliminate unwanted records by extending the
WHERE
statement to include only records where thep1.year
matchesp2.year - 5
.
SELECT p1.country_Code, p1.size AS size2010, p2.size AS size2015
FROM populations AS p1
INNER JOIN populations AS p2
ON p1.country_code = p2.country_code
WHERE p1.year = p2.year-5;
That’s all for now!!
Don't be afraid of trying and making mistakes. Happy learning