SQL Inner Join: Linking Database Tables
A relational database lets you use primary and foreign keys to connect your tables. The database manages the relationship, but only one table’s result set returns when you query — unless you link the data. The most common methods used to link data are outer joins and inner joins, although there are different types of joins as well (such as self joins and left joins).
In this article, you’ll learn more about using inner joins to return rows from both tables — and how to build SQL syntax queries to get the data you need.
What is a SQL Inner Join vs. a SQL Outer Join?
Before you begin joining tables, you need to know whether you’re trying to produce an inner join or an outer join. An inner join produces an intersection of two tables, whereas an outer join provides a union of two tables. These will bring dramatically different results.
Inner joins will not include rows that don’t match, while outer joins can have both matching rows and unmatching rows. This is a major difference. You can run an inner join and an outer join to see the data that each generates and identify which one is right for you.
There’s also another type of join: a left join. A left join would produce all records in the left table whether or not it matched the other table, whereas the inner join would (again) not include rows that did not match. But a left join tends to be far more expensive resource-wise than an inner join.
If you’re trying to find data that should always match (producing rows with content in each table), then an inner join is usually the right choice. But if you want to produce all the content of your tables, whether or not the rows match, an outer join is better.
Primary and Foreign Keys for Joins
The concept of linking tables stems from primary and foreign keys. Every table in your SQL database should have a primary key. A primary key must be unique and cannot be null. You can use composite keys, which means multiple table columns make up a primary key.
Whatever your choice, the primary key is the column or group of columns that uniquely identifies your record. For instance, if you have a customer table, you need the primary key to determine your customer IDs.
Modern databases don’t always have primary keys. It’s worth noting that this is a best practice, but it isn’t a practice that the SQL system enforces. It’s possible to have entire systems without a unique ID or a primary key, in which case you will need to create it. It’s also possible to create a primary key ‘on-the-fly’ by linking multiple factors of the database, such as the person’s first and last name — and some primary keys are something like an email address. Either way, there has to be some unique signifier for the table.
Most relational databases such as Microsoft SQL Server and MySQL have auto-increment functionality. Auto-increment columns automatically add 1 to the last record and insert the new number into a column. Some small businesses and database developers use this technique to create a primary key for the tables. Auto-incrementing columns guarantee that your tables always have a unique column. The SQL database will return an error if you attempt to create a duplicate primary key. But again, while it’s best practice to have this type of unique identifier, not everyone uses one.
Foreign keys contain the same primary key value, but they exist in a related table. For instance, an eCommerce store has a table for customers and orders. The customer table contains the primary key that identifies the customer, and the orders table contains a foreign key such as “CustomerId” that contains the customer’s primary key value in each record. This primary and foreign key relationship is the basis for your inner join queries.
Writing an Inner Join Query
Now that we understand primary keys and the different types of join queries, we’ll look at the syntax for SQL inner join. But first, a simple SQL query.
The following code returns a list of customers from a “customers” table:
select * from customers
The above query is probably the most basic query you can write for your SQL database. The above query has no filters, no ordering, no subqueries, and no joins. However, it’s not very useful for most applications. To add a bit of usefulness to the query, you use the “where” clause to filter your records. The following SQL code gets a list of customers who signed up to the site within the last week:
select * from customers where signupdate >= getdate()-7
Now, this query is great if you just need a list of customers, but suppose you need a list of customers and related orders? You could get a list of orders and link them to customers in your front-end code, but it’s completely inefficient and inaccurate.
That’s where the inner join query comes into play. The following code uses the same “where” clause filter, but it returns a list of orders for your customers as well:
select * from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7
Incidentally, with Microsoft SQL Server, you can eliminate the “inner” part of the join query, and SQL will assume you mean the “inner” phrase. The returned record set contains every column in both the customers and orders table. Notice the “c” and “o” aliases. If you didn’t specify these aliases, you’d need to write out the entire table name to join columns. Aliases make it easier to reference tables. If your customer’s table and your orders table both have a “signupdate,” SQL Server will return an error that indicates you’ve referenced ambiguous column names.
One difference between the inner join and the alternative outer join is the number of records displayed. If the SQL database server can’t find a linked order for the customer, the customer is eliminated from the record set. Therefore, the records you receive are only customers with orders and a signup date within the last seven days. Alternatively, an outer join query will return all customer records and any customer orders. With an outer join, you view the records set, and you see null values for orders where a customer does not have any order records.
For instance, the following SQL code queries the SQL server and uses an outer join to return the same logical record set. All customer records within the signup “where” clause return regardless if an order exists:
select * from customers c outer left join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7
The last few examples use only the customers’ columns to filter records. You can also use the joined table columns. The following SQL code gets all customers who signed up within the last seven days and any orders placed within the last five days:
select * from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7 and o.orderdate >= getdate()-5
Notice the aliases used for both columns. The “c” alias tells the SQL database that the column is located in the customers’ table. The “o” indicates that the column is in the orders table. These aliases need to be tables-based.
The final change you need to make to the query is listing the columns to return. Right now, the asterisk indicates to the database to return all columns. It’s an inefficient way to return a data set. You usually only need a few columns in your record set to work with it on the front-end of the application. To fix the issue, you specify columns. You also need to use your aliases for returned columns as well. The following code is an example:
select c.first_name, c.last_name, o.orderdate from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7 and o.orderdate >= getdate()-5
In the above example, the first name and last name columns are returned. You know they appear in the customers’ table from the “c” alias prefix. The last column returned is the order date. You know this column is present in the orders table from the “o” alias.
This is just a start to returning data from a database from multiple tables. You can continue adding inner joins (and outer joins) until you link all your tables and get all columns you need for the front-end application.
Now, it should be noted that there are times when databases would be better structured so as not to require joins. If an organization was regularly having to join a “customer” table with “customer_contact_information,” for instance, it would make more sense for the customer table to include customer contact information.
But in the above example, the join makes sense. It makes sense for customers and orders to be discrete tables, and because of that, it further makes sense to use a join when comparing the tables. And because the developer is not interested in finding customers who don’t match orders (and orders who don’t match customers), it makes sense for there to be a SQL inner join query rather than a SQL outer join query.
Note also that you can’t just turn an insert into select; you can’t join an insert, and the join condition will not translate.
Common Mistakes When Using an Inner Join Query
Apart from using some complex syntax, an inner join is fairly straightforward. One of the common mistakes when using an inner join query is accidentally using the wrong primary keys. Either incorrect identifiers can be used, or an identifier could be used that is not actually unique, such as a full name — there could be multiple customers with the same name. It’s also possible to use the wrong type of join, such as a left outer join or full outer join, rather than a simpler method.
Another issue can be trying to compare items that are not of the same data types or accidentally pulling the wrong values from the wrong tables. Because there are multiple tables in play, it’s very important that the developer specifies which tables they are trying to pull the data from. If both tables have columns that share the same name but contain different data, this can be confusing.
For instance, when it comes to a “customer,” there might be an email address logged under the customer table (customer.email), but there might also be an email address logged with an order (order.email). These email addresses might not be the same; the customer may have specified an email address specific to the order that’s in progress. In this case, the developer must be very clear on whether they are trying to pull the customer’s email address or the email address attached to the order — both on a conceptual level and a syntax level. The comparison operator is especially important.
Other than this, inner join queries can be complicated when multiple developers are working on the same team. You should leave comments and notes to describe what the inner join query is trying to complete because join queries tend to be less “readable” than other types of query — it can be difficult to derive what the query is trying to do at-a-glance.
Major Concerns When Using an Inner Join Query
As with most join queries, an inner join query can potentially use a significant amount of resources in a system. Join queries tend to be demanding both in terms of CPU and RAM. Because of this, it’s important to take special care when writing these join queries. If many extensive join queries are necessary for an application to work, it may be better to restructure the database so they aren’t required.
Top courses in Development
As the join query becomes more complex, it can become exponentially more of a “resource hog.” The more tables involved in the join query, the harder the system will need to work. Ideally, developers should require as little information returned from their join queries as possible, and as little data manipulated. You can join multiple tables as long as resource usage is guarded.
Syntactically, join queries are complex. If the wrong type of join query is chosen (such as an outer when an inner is required), the data provided will not be accurate. Further, join queries are not always readable. Because they are complicated, it may not always be apparent what the results of the table will be, given the provided query. So, you should thoroughly study queries to make sure they are yielding the right results. It’s also important to name everything properly: “join table2” is not descriptive like “join customers” is.
Regardless, though there are some downsides to using join queries, they are often the best and easiest way to acquire data sets across multiple tables. Join queries are useful in applications as complex as Google Maps and as simple as the login pages — and once mastered, they are very powerful.
Now that you have the inner join covered, let’s take a look at some more useful SQL commands.
Top courses in SQL
SQL students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.