A relational database lets you use primary and foreign keys to connect your tables. The database manages the relationship, but only one table’s record set is returned when you query unless you link the data. To link data, there are outer and inner joins. This article discusses inner joins to return multiple tables’ rows related to a specific query.
Primary and Foreign Keys
The concept of linking tables stems from primary and foreign keys. Every table in your SQL database should have a primary key. A primary key must be unique and cannot be null. You can use composite keys, which means multiple table columns make up a primary key. Whatever your choice, the primary key is the column or group of columns that uniquely identifies your record. For instance, if you have a customer table, you need the primary key to determine your customer IDs.
Additionally, most relational databases such as Microsoft SQL Server and MySQL have auto-increment functionality. Auto-increment columns automatically add 1 to the last record and insert the new number into a column. Some small businesses and database developers use this technique to create a primary key for the tables. Auto-incrementing columns guarantee that your tables always have a unique column. The SQL database will return an error if you attempt to create a duplicate primary key.
Foreign keys contain the same primary key value, but they are located in a related table. For instance, an ecommerce store has a table for customers and orders. The customer table contains the primary key that identifies the customer, and the orders table contains a foreign key such as “CustomerId” that contains the customer’s primary key value in each record. This primary and foreign key relationship is the basis for your inner join queries.
Writing an Inner Join Query
First, let’s take a very simple SQL query. The following code returns a list of customers from a “customers” table:
select * from customers
The above query is probably the most basic query you can write for your SQL database. The above query has no filters, no ordering, no subqueries and no joins. However, it’s not very useful for most applications. To add a bit of usefulness to the query, you use the “where” clause to filter your records. The following SQL code gets a list of customers who signed up to the site within the last week:
select * from customers where signupdate >= getdate()-7
Now, this query is great if you just need a list of customers, but suppose you need a list of customers and related orders? You could get a list of orders and link them to customers in your front-end code, but it’s completely inefficient and inaccurate.
That’s where the inner join query comes into play. The following code uses the same “where” clause filter, but it returns a list of orders for your customers as well:
select * from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7
Incidentally, with Microsoft SQL Server, you can eliminate the “inner” part of the join query, and SQL will assume you mean the “inner” phrase. The returned record set contains every column in both the customers and orders table. Notice the “c” and “o” aliases. If you didn’t specify these aliases, you’d need to write out the entire table name to join columns. Aliases make it easier to reference tables. If your customers table and your orders table both have a “signupdate,” SQL Server will return an error that indicates you’ve referenced ambiguous column names.
One difference between the inner join and the alternative outer join is the number of records displayed. If the SQL database server can’t find a linked order for the customer, the customer is eliminated from the record set. Therefore, the records you receive are only customers with orders and with a signup date within the last 7 days. Alternatively, an outer join query will return all customer records and any customer orders. With an outer join, you view the records set and you see null values for orders where a customer does not have any order records.
For instance, the following SQL code uses an outer join to return the same logical record set, but all customer records within the signup “where” clause are returned regardless if an order exists:
select * from customers c outer left join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7
In the last few examples use only the customers columns to filter records. You can also use the joined table columns. The following SQL code gets all customers who signed up within the last 7 days and any orders placed within the last 5 days:
select * from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7 and o.orderdate >= getdate()-5
Notice the aliases used for both columns. The “c” alias tells the SQL database that the column is located in the customers table. The “o” indicates that the column is in the orders table.
The final change you need to make to the query is listing the columns to return. Right now, the asterisk indicates to the database to return all columns. It’s an inefficient way to return a data set. You usually only need a few columns in your record set to work with it on the front-end of the application. To fix the issue, you specify columns. You also need to use your aliases for returned columns as well. The following code is an example:
select c.first_name, c.last_name, o.orderdate from customers c inner join orders o where c.customerid=o.customerid where c.signupdate >= getdate()-7 and o.orderdate >= getdate()-5
In the above example, the first name and last name columns are returned. You know they are located in the customers table from the “c” alias prefix. The last column returned is the order date. You know this column is located in the orders table from the “o” alias.
This is just a start to returning data from a database from multiple tables. You can continue adding inner joins (and outer joins) until you link all your tables and get all columns you need for the front-end application.