How to Combine Record Sets with the SQL UNION Operator
SQL (Structured Query Language) is an industry-standard programming language for managing data in structured, relational databases. This flexible and versatile language operates by executing a series of statements that return the requested data from the database. SQL statements consist of strings of clauses that refine the search, and operators that specify how to manipulate the resulting data in a relational database. The SQL UNION operator is one of these—and it might be one of the most underused operators in SQL.
As the name implies, the UNION operator combines the result set of two or more SQL SELECT statements— the basic statement that finds and identifies data sets for all kinds of SQL operations. However, other well-structured query statements can perform the functionality of the UNION operator as well, so some database administrators and programmers consider it to be a little sloppy.
Using a UNION statement instead of these other options can avoid running an overly complex query, or even multiple queries. In those circumstances, it might be clearer and more efficient to use UNION. In some cases, UNION could even provide better performance than other approaches to managing data from multiple databases.
In this article, we’ll take a brief look at the UNION operator in SQL, with examples of this operator performing various tasks.
To practice using this and other SQL clauses, you should know the basics of relational databases and programming with SQL, including how to create a suitable database for testing. MySQL is an open-source package that many use to learn database design with server software. Or, for databases on a desktop computer, Microsoft Access can also provide the background knowledge and a suitable environment for practicing the use of SQL’s UNION operator.
The Basics of the UNION Operator
At the most basic level, the UNION operator joins two separate database queries into a single set of results. These queries can refer to different columns and tables within the database, providing that the following conditions apply:
- Both queries must return the same number of columns.
- Both queries must return “compatible” data types for each column in the results, such as text or numerals.
In the example below, we see a simple UNION operation that behaves like OR in a WHERE clause so that it selects rows when either condition applies:
SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom' UNION SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'Japan'
In this example, the first SELECT statement retrieves data from the Supplier table for the United Kingdom, while the second SELECT statement pulls the same data from the Supplier table for Japan. In both statements, the data sets meant for return are the same.
You can also select and unify results from two different tables by referring to another table in the second query, like this:
SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom' UNION SELECT [id], [name], [country] FROM Customers WHERE Country = 'United Kingdom'
In this case, the first SELECT statement retrieves data from the Suppliers table while the second SELECT statement retrieves data from the Customers table. Here, both tables relate to the United Kingdom.
There’s no reason you can’t unify the results of more than two queries, as below:
SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom' UNION SELECT [id], [name], [country] FROM Customers WHERE Country = 'United Kingdom' UNION SELECT [id], [name], [country] FROM Inspectors WHERE Country = 'United Kingdom'
This query looks complex, but it’s similar to the two previous examples. In this case, we’ve added a third SELECT statement. Here, all tables relate to the same country, the UK, but the three separate SELECT statements include FROM clauses that identify three different data sets: “Suppliers,” “Customers,” and “Inspectors.”
In some situations, the column names might differ between structures. When that happens, the results data set will typically use the column names for the first query.
Using UNION ALL
In general, there’s no need for duplicate data sets. SQL databases consolidate the query results so that rows that match both queries in a UNION only appear once in the result data set. Note that rows that appear identical, but are in different tables, DO NOT count as duplicate sets.
In most cases, the standard operation of UNION is all that’s needed to achieve the desired results. But for situations where you would want duplicate rows, you can use the UNION ALL operator – a variation on UNION that offers more options for expanding the results of a SELECT query and allows for the return of duplicate data sets from multiple tables.
UNION ALL is nearly identical to UNION. It differs only in that all rows from both queries return, even if that causes a duplication of the records. But even if you know that there’s no possibility of a duplication of records, using UNION ALL instead of UNION can slightly improve query performance. As we’ve noted, with the UNION operator, SQL consolidates query results to filter out duplicates. But when one uses UNION ALL, most database systems will avoid trying to filter duplicated records. Here’s an example:
SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom' UNION ALL SELECT [id], [name], [country] FROM Suppliers WHERE [Name] = 'Test Corp.'
If the record for “Test Corp” also had the country “United Kingdom,” it would appear twice in the results set under UNION ALL. But since the query doesn’t state that, there’s no need to filter for duplication. In this case, only a single data set returns from the query.
If you’re combining three or more queries with a mix of UNION and UNION ALL operations, be sure to use parentheses to make sure that the queries combine in the way you intend, not as the database system chooses. Since a UNION is a combination of two queries, you’ll need to control the pairing when using multiple UNIONs.
Ordering the results of multiple queries using UNION may not be a straightforward process. When both queries return the same column names, you can order the combined result set by adding an ORDER BY clause in the last query, like this:
SELECT [id], [name], [country], [telephone] FROM Suppliers UNION SELECT [id], [name], [country], [cellphone] FROM Customers ORDER BY [telephone]
Here, the column names are not identical. The first SELECT statement specified “telephone,” while the second SELECT statement names it “cellphone.” In cases where the column names are different, the statement uses the column names of the first query. Then you can either create an ORDER BY clause using those column names or by using the number of the column instead of its name. The SQL query below performs an operation that’s identical to the one above, but it resolves the problem of different column names by ordering them numerically instead.
SELECT [id], [name], [country], [telephone] FROM Suppliers UNION SELECT [id], [name], [country], [cellphone] FROM Customers ORDER BY 4
By using parentheses and separate ORDER BY clauses with each query, you can sort each one independently of the other. That means that the ordered results of the second query append to the first rather than mixed in the ordering. It looks like this:
(SELECT [id], [name], [country] FROM Suppliers ORDER BY [Name]) UNION (SELECT [id], [name], [country] FROM Customers ORDER BY [Country])
This query has reordered the syntax to include a separate ORDER BY clause in each SELECT statement, which further clarifies how to sort the data.
Renaming Columns for UNION Operations
It isn’t strictly necessary for the columns of the two queries to have the same names, but that can certainly be useful when you’re constructing ORDER BY clauses. In those cases, you can use the SQL operator AS to rename the columns. Here’s how that would look:
SELECT [id], [name], [cellphone] AS telephone FROM Customers UNION SELECT [id], [name], [telephone] FROM Suppliers ORDER BY telephone
In this example, the “cellphone” column of the Customer’s table becomes AS “telephone.” Without renaming this column as “telephone,” the ORDER BY clause would result in an error because “telephone” is not the name of a column in the first query.
Casting and Converting Columns for UNION Operations
Remember that the second requirement for using UNION requires that the columns of each query should be “compatible.” This means that the database system can automatically convert the data in the results of the second query to match the data types used in the first query. But in some situations, you may want to combine different data structures in a way that results in incompatible data types.
In those cases, you can use the CAST and CONVERT functions. These functions require more discussion than we can cover here, but here’s a brief introduction.
CAST and CONVERT are often used interchangeably in SQL. Both functions provide a way to change a data type and convert it to another whenever needed, but they aren’t quite the same. CAST is an ANSI (American National Standards Institute) standard, which means it can function across multiple database systems, while CONVERT is a function that’s specific to the Microsoft SQL Server.
CAST can remove or change formatting in many different databases, but it’s less flexible than CONVERT. The CONVERT function can format data relating to date, time, money signifiers, and traditional numbers. Both CAST and CONVERT can combine in SQL databases to achieve certain results. Without using either of them, an implicit conversion can occur—a situation in which the SQL server has to automatically convert one data type to another when moving data or combining values from different sources.
If you’re not familiar with SQL Server, you can learn all you need to know about this popular piece of software in a very short time.
The example below selects multiple columns containing varying data types from a table. These results combine with a second query that uses a UNION operator. Here, “CRM No” is a number in the Suppliers table, but a text value in the Customers table. The operation works because the database can generally convert numerical values to text values automatically. Here’s what that looks like:
SELECT [id], [name], [CRM_No] FROM Customers UNION SELECT [id], [name], [CRM_No] FROM Suppliers ORDER BY [CRM_No]
But if we reverse the order of these two queries, the database cannot automatically convert a text value to a number. That generates an error that resolution before SQL can execute the correct result.
SELECT [id], [name], [CRM_No] FROM Suppliers UNION SELECT [id], [name], [CRM_No] FROM Customers ORDER BY [CRM_No]
To solve this problem, you can use a specific CAST operation that treats the “CRM No” in the Suppliers table as text, with a maximum length of 30 characters:
SELECT [id], [name], CAST ([CRM_No] AS VARCHAR (30)) FROM Suppliers UNION SELECT [id], [name], [CRM_No] FROM Customers ORDER BY [CRM_No]
Now, the two values will be treated identically as text.
Here’s how the same example will look using CONVERT for SQL Server:
SELECT [id], [name], CONVERT(VARCHAR(30), [CRM_No]) FROM Suppliers UNION SELECT [id], [name], [CRM_No] FROM Customers ORDER BY [CRM_No]
Note: the CAST and CONVERT functions are not available in Microsoft Access. Because Access has its own unique set of conversion operators, CSTR([CRM No]) would be used in this case instead of either CAST or CONVERT.UNION is an underutilized but effective way to join queries and results from different tables in SQL databases, but it’s only one of many available options. Get an overview of the most essential SQL commands in our blog. Or if you’re ready for a deeper dive into the database language, join us in learning all about the SQL IN statement.
Top courses in SQL
SQL students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.