SQL Inner Join: Linking Database Tables

Article Summary

An SQL inner join returns only the rows where matching values exist across two tables. This article covers join types, primary and foreign keys, query syntax, aliases, and common mistakes. You will gain the skills to write accurate, efficient inner join queries with confidence.

A relational database allows you to store information in separate tables — but what happens when you want to link them together? Suppose you had one table, an Order table, and another which lists Product orders. What if you’re going to create a query that gives details of all of the orders, including products?

In this article, you’ll learn more about using inner joins to return rows from both tables — and how to build SQL syntax queries to get the data you need.

What are the different types of JOINs?

Before you begin joining tables, you need to know whether you’re trying to produce an inner join or an outer join. Suppose you have two tables that are being joined. In the below example, one table has some rows with IDs between 3 and 19, whereas the second table has rows with IDs 16 and 29. There is, therefore, an overlap between these two tables — those rows with IDs between 16 and 29.

Recommended Udemy course

70-461, 761: Querying Microsoft SQL Server with Transact-SQL

Phillip Burton • over one million students so far, I Do Data Limited

You can see this example in the Venn diagram below. These two are tables based on ID (identity numbers) and other data. The left-hand circle shows the first table IDs, and the right-hand circle shows the second table IDs. IDs 16, 17, 18 and 19 are shown in the middle, as they are common to both tables.

Venn diagram, tables based on ID (identity numbers) and other data

An inner join produces an intersection of two tables, only including rows that match. This is the light green section in the Venn diagram below, which contains IDs 16, 17, 18 and 19.

Venn diagram, intersection of two tables, only including rows that match

The alternative to this is the outer joins, but there are many different types of outer joins. A full outer join will provide both matching rows and unmatching rows. In this example, it will provide all the IDs from 3 to 29. In the below Venn diagram, both circles are shaded light green, as all of the rows will be returned.

venn diagram, both circles are shaded light green, as all of the rows will be returned

A left outer join (more commonly known as a “left join”) would produce all records in the first (“left”) table whether or not it matched the other table, together with any rows in the second (“right”) table. In the below Venn diagram, the entirety of the left-hand table, including the overlap, has been shaded light green.

venn diagram, left-hand table, including the overlap, has been shaded light green

A more advanced version of this exists called a self join, commonly used for hierarchical relationships, such as an employee to a manager.

If you’re trying to find data that should always match in both tables (with matching rows in both tables), then an inner join is usually the right choice. But if you want to produce all the content of your tables, whether or not the rows match, a left, right or full outer join may be better.

Primary and Foreign Keys for Joins

The concept of linking tables primarily stems from primary and foreign keys. Most tables in your SQL database will have a primary key. For instance, if you have a customer table, the primary key could store your customer IDs. A primary key must have unique values — they cannot be duplicated in other rows — and cannot be null. If you tried to insert a row that contained a duplicate primary key, you would get an error, and the new row would not be stored.

You can use a composite primary key, where multiple table columns make up a single primary key. For example, details of an order may have OrderID and ProductID together making up a primary key.

It is best practice for all tables to have a primary key. However, in most systems, you are able to have tables without primary keys. In Microsoft’s T-SQL, such a table is called a “heap.” However, this is discouraged for all except the smallest table or for tables that need to have a considerable amount of data inserted very quickly, which will then be processed subsequently.

Most relational databases such as Microsoft SQL Server and MySQL have an auto-increment functionality. Auto-increment columns automatically add 1 to the last row or record and insert the new number into a new row. For example, suppose you have an Order table and want each new row to have a unique number. If you used an autonumber, then this can be done automatically.

Foreign keys are the primary key value for a different but related table. For instance, suppose that an eCommerce store has a table for customers and orders. The customer table would contain a primary key that identifies the customer, such as CustomerID, which could be an auto-increment column. The orders table could include this CustomerID column to link each order to a single customer. Therefore, this foreign key would reference the customer table’s primary key value in each record.

This primary and foreign key relationship is generally the basis for your inner join queries. However, you can write JOIN queries without such a formal relationship.

Writing an Inner Join Query

Let’s look at the syntax for SQL inner join.

The following code returns a list of customers from a “customers” table:

SELECT *
FROM Customers;

The above query is probably the most basic query you can write for your SQL database. It has no filters, no ordering, no subqueries, and no joins. To add more usefulness to the query, you use the “where” clause to filter your records. The following SQL code in some SQL dialects gets a list of customers who signed up to the site within the last week:

SELECT *
FROM Customers
WHERE signupdate >= getdate()-7;

Now, this query is great if you just need a list of customers, but suppose you need a list of customers and related orders? You could get a list of orders and link them to customers in your front-end code, but it’s completely inefficient and inaccurate.

That’s where the inner join query comes into play. The following code uses the same “where” clause filter, but it returns a list of orders for your customers as well:

SELECT c.*, o.*
FROM customers c
INNER JOIN orders o
ON c.customerid=o.customerid
WHERE c.signupdate >= getdate()-7;

Incidentally, in many dialects, you can eliminate the “INNER” word, leaving just the word “JOIN,” and SQL will assume you mean the “inner” version.

The record set returned above contains every column in both the customers and orders table. In lines 2 and 3, they have been aliased as “c” and “o.” Aliases make it easier to reference tables. If you didn’t specify these aliases, you’d need to write out the entire table name to join columns.

Using aliases with individual columns may not be necessary if that field or column name only exists in one table. For example, the WHERE clause starts with “WHERE c.signupdate.” If “signupdate” was used only in your customer’s table, then you could omit the “c.” and just have “WHERE signupdate.” If your customer’s table and your orders table both have a “signupdate” column and you use “WHERE signupdate,” the query will return an error that indicates you’ve referenced ambiguous column names.

Line 4 shows the join condition. It returns the rows if “customerid” matches in the two tables. When using an INNER JOIN, if the query can’t find a linked order for the customer, the customer is eliminated from the record set. Therefore, the rows or records you retrieve are only customers who have made orders and (because of the WHERE clause) have a signup date within the last seven days.

If you want all rows from both tables, you should use a FULL OUTER JOIN. In this example, it would return all customer records and all customer orders. If a customer does not have any orders, you will see the customer’s details, together with NULL values for the orders columns. With an INNER JOIN, any such customer would be removed from the query.

The following SQL code queries the SQL server and uses a FULL OUTER JOIN. All customer records within the signup “where” clause return regardless if an order exists:

SELECT c.*, o.*
FROM customers c
FULL OUTER JOIN orders o
ON c.customerid=o.customerid
WHERE c.signupdate >= getdate()-7;

The last few examples use only the customers’ columns to filter records. You can also use the joined table columns. The following SQL code gets all customers who signed up within the last seven days and any orders placed within the last five days:

SELECT c.*, o.*
FROM customers c
INNER JOIN orders o
ON c.customerid=o.customerid
WHERE c.signupdate >= getdate()-7 and o.orderdate >= getdate()-5;

You can make an additional change to the query regarding which columns to return. Right now, the asterisks indicate to the database that it needs to return all columns in both tables, which takes more time and more input/output requirements than if you only required a few columns.

However, you usually do not need all of these columns to be in your record set to work with it on the front end of the application. To fix the issue, you should specify which columns to return. You also need to use your aliases for returned columns if the name of that column is included in multiple tables in the JOIN. The following code is an example:

SELECT c.first_name, c.last_name, orderdate
FROM customers c
INNER JOIN orders o
ON c.customerid=o.customerid
WHERE c.signupdate >= getdate()-7 AND o.orderdate >= getdate()-5;

In the above example, the first name and last name columns are returned. You know they appear in the customers’ table from the “c” alias prefix. The last column returned is the order date. Because an alias has not been used, you do not necessarily know from which table this has been taken, but you do know that it has not been used in both tables.

This is just a start to returning data from a database from multiple tables. You can continue adding inner joins until you link all your tables and get all columns you need for the front-end application.

There are times when databases would be better structured to require fewer joins. If an organization regularly had to join a “customer” table with “customer_contact_information,” it may make more sense for the customer table to include customer contact information. In this case, you might want to create a SELECT statement with the JOIN and encapsulate (save it) as a view.

However, with the example of customer and order tables, it makes sense for these tables to be separate. Because of that, it further makes sense to use a join when creating a query that combines the tables. As a developer might not be interested in finding customers who don’t have orders (and orders who don’t have a valid customer), it would make sense for your query to use a SQL inner join query rather than a SQL outer join query.

Common Mistakes When Using an Inner Join Query

Apart from using some complex syntax, an inner join is fairly straightforward. One of the common mistakes when using an inner join query is accidentally using the wrong primary or foreign keys. Incorrect identifiers can be used, or an identifier could be used that is not actually unique, such as a full name, as there could be multiple customers with the same name. If this happens, you may find that the number of rows retrieved from the query multiplies.

It’s also possible to use the wrong type of join, such as a left outer join or full outer join, rather than an inner join (or the other way round), if that is what you intended. An inner join in a query will probably retrieve fewer rows than if a left or full outer join is used.

Another issue can be comparing items that are not of the compatible data types or accidentally pulling the wrong values from the wrong tables. As you are using multiple tables, it is very important that the developer accurately specifies the correct tables and columns. If both tables have columns that share the same name but contain non-compatible data, this can be confusing.

For instance, when it comes to a “customer,” you might have an email address logged under the customer table, but there might also be an email address logged with an order. These email addresses might not be the same; the customer may have specified an email address specific to the order instead of a more general email address when setting up the data in the customer table. In this case, the developer must be very clear on what email address you are trying to retrieve — either the customer’s email address or the email address attached to the order — both on a conceptual and syntax level. The columns used in the comparison operator are especially important.

Other than this, queries can be complicated when multiple developers are working on the same team. You should leave comments and notes to describe what the inner join query is trying to complete because join queries tend to be less “readable” than other types of queries — it can be challenging to work out what the query is trying to do at a glance.

Major Concerns When Using an Inner Join Query

As with most join queries, an inner join query can potentially use a significant amount of resources in a system. Join queries can be demanding both in terms of CPU and RAM. Because of this, it’s important to take special care when writing these join queries. If many extensive join queries are necessary for an application to work, it may be better to restructure the database schema so they aren’t required.

As the join query becomes more complex, it can become exponentially more of a “resource hog.” The more tables involved in the join query, the harder the system will need to work. Ideally, developers should require as little information returned from their join queries as possible and as little data manipulated. You can join multiple tables as long as the resource usage is guarded.

If the wrong type of join query is chosen (such as an outer when an inner should have been required), the data retrieved will not be what was wanted. So, you should thoroughly study the results from queries to make sure they are what you expect. For documentation reasons, it is also important to name everything to make it easier to read: “join table2” is not descriptive like “join customers” is.

Join queries are usually the best and easiest way to acquire result sets across multiple tables. Join queries are used in applications as complex as Google Maps and as simple as login pages — and once mastered, they are very powerful.

Thank you for reading

If you would like to learn more about queries, why not join me in one of my Udemy courses. If you want to learn Oracle SQL, then please join my Oracle SQL Developer course. If you want to learn Microsoft’s T-SQL, please participate in my 1-hour SELECT statement course, 8-hour Database Fundamentals course, or 29-hour course Querying Microsoft SQL Server with Transact-SQL.

Now that you have the inner join covered let’s look at some additional useful SQL commands. Thank you very much for reading this, and keep learning.

Phillip Burton is a Microsoft Certified Technology Specialist and a Computing Consultant providing expert services in the development of computer systems and data analysis. He has +1M students and over a dozen Bestseller courses.