The UNION Operator in SQL

union sqlIf a basic structured query language (SQL) operator can ever be said to be under-utilized, then the UNION might fit that description. Its functionality is often easily duplicated by a well-structured query – leading many database administrators and programmers to regard UNION as a little sloppy. However, if using it avoids an overly-complicated query (or several queries) being run then it may be clearer to use it. In certain circumstances it can even provide better performance.

The examples presented offer a short overview of the UNION operator in SQL. As such, you are expected to know the very basics of relational databases and SQL-programming – including how to create a suitable database for testing. As an open-source package, MySQL is often used for learning database design with server software. For databases on a desktop computer, learning Microsoft Access would provide a suitable environment and background knowledge with which to practice use of the UNION operator.

The Basic UNION

At its simplest level, the UNION operator joins two database queries into a single set of results. These queries can refer to different columns and tables providing that the following conditions are met:

  • Both queries must return the same number of columns.
  • For each column in the results, both queries must return “compatible” data types.

Given the table structures shown below, the following code shows a simple UNION operation that behaves like OR in a WHERE clause, selecting rows when either set of conditions is met.

union-operator1

 

SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom'
UNION
SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'Japan'

To select and unify results from two different tables, you can simply refer to a different table in the second query:

SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom'
UNION
SELECT [id], [name], [country] FROM Customers WHERE Country = 'United Kingdom'

And there is no reason you can’t unify the results of more than two queries, simply by repeating the use of UNION:

SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom'
UNION
SELECT [id], [name], [country] FROM Customers WHERE Country = 'United Kingdom'
UNION
SELECT [id], [name], [country] FROM Inspectors WHERE Country = 'United Kingdom'

In situations where the column names differ between structures, the results data set will typically use the column names from the first query.

SQL databases consolidate the results so that rows which match both queries in a UNION only appear once in the results set. You should note that rows that appear identical but are in different tables, do NOT count as being a duplicate.

In most cases, the standard action of UNION is what is required. For situations where duplicate rows would be desired, you can use the UNION ALL operator.

UNION ALL

UNION ALL differs from UNION in only that all rows from both queries are returned, even when this causes a duplication. If you know that no duplication of records is possible, using UNION ALL can give slightly better performance because most database systems will avoid trying to filter duplicated records when UNION ALL is used.

With the following example, if the record for “Test Corp.” had the country “United Kingdom”, it would now appear twice in the results set.

SELECT [id], [name], [country] FROM Suppliers WHERE Country = 'United Kingdom'
UNION ALL
SELECT [id], [name], [country] FROM Suppliers WHERE [Name] = 'Test Corp.'

If combining three or more queries with a mixture of UNION and UNION ALL operations, you should use parenthesis to ensure that the queries are combined in the way you intend, and not the way the database system decides to do it. Remember, a UNION is a combination of two queries so when using multiple UNIONS, you need to control the pairing.

Ordering Results

Ordering the results of multiple queries combined by UNION is not always a straightforward process. It situations where both queries return the same column names, it is possible to order the combined result set by adding an ORDER BY clause to the last query.

SELECT [id], [name], [country], [telephone] FROM Suppliers
UNION
SELECT [id], [name], [country], [cellphone] FROM Customers
ORDER BY [telephone]

In situations where the column names are different, the column names of the first query are used. So you can either create an ORDER BY clause using those column names, or by using the number of the column instead of its name. The following SQL query performs an identical operation to the one above:

SELECT [id], [name], [country], [telephone] FROM Suppliers
UNION
SELECT [id], [name], [country], [cellphone] FROM Customers
ORDER BY 4

Using parenthesis and separate ORDER BY clauses with each query allows you to sort each one independently of the other. The ordered results of the second query are merely appended to the first, not mixed in the ordering.

(
SELECT [id], [name], [country] FROM Suppliers
ORDER BY [Name]
)
UNION
(
SELECT [id], [name], [country] FROM Customers
ORDER BY [Country]
)

 

Renaming Columns for UNION Operations

While it is not strictly necessary that the columns of the two queries have the same name, it can certainly be useful when constructing ORDER BY clauses. The SQL operator AS can be used to rename columns. This is shown in the example below:

SELECT [id], [name], [cellphone] AS telephone FROM Customers
UNION
SELECT [id], [name], [telephone] FROM Suppliers
ORDER BY telephone

Without renaming the cellphone column of the Customer’s table, the ORDER BY clause would result in an error as telephone is not the name of a column in the first query.

Casting and Converting Columns for UNION Operations

Remember that the second requirement stated earlier for using UNION requires that the columns of each query are “compatible” – meaning that the database system can automatically convert the data in the results of the second query to match the data types used in the first query.

In situations where you want to combine different data structures in such a way that leaves you with incompatible data types, you can use the CAST and CONVERT functions. This can be a lengthy topic, one that cannot be covered fully here. However, as an introduction…

CAST and CONVERT are two functions for converting one data type to another. Generally speaking, database systems have adopted the more-standard CAST, however, Microsoft SQL Server uses the CONVERT function instead. If you’re entirely unfamiliar with this extremely popular piece of software, it is possible to learn what you need to know about SQL Server in a very short period of time.

The example below selects multiple columns from a table, of varying data types. These results are combined with a second query using a UNION operator. Despite “CRM_No” being a number in the Suppliers’ table and a text value in the Customer’s table, this example works because the database can usually convert numbers to text values automatically.

SELECT [id], [name], [CRM_No] FROM Customers
UNION
SELECT [id], [name], [CRM_No] FROM Suppliers
ORDER BY [CRM_No]

However, when the order of the queries is reversed, an error is generated because the database cannot automatically convert a text value to a number.

SELECT [id], [name], [CRM_No] FROM Suppliers
UNION
SELECT [id], [name], [CRM_No] FROM Customers
ORDER BY [CRM_No]

To solve this, you can use a specific CAST operation to force the “CRM_No” in the Suppliers’ table to be treated as text (with a maximum length of 30):

SELECT [id], [name], CAST([CRM_No] AS VARCHAR(30)) FROM Suppliers
UNION
SELECT [id], [name], [CRM_No] FROM Customers
ORDER BY [CRM_No]

Here is the same example, this time using CONVERT for SQL Server:

SELECT [id], [name], CONVERT(VARCHAR(30), [CRM_No]) FROM Suppliers
UNION
SELECT [id], [name], [CRM_No] FROM Customers
ORDER BY [CRM_No]

Note that the CAST and CONVERT functions are not available to Microsoft Access users. Access has its own, unique, set of conversion routines and in this case CSTR([CRM_No]) would be used instead.

Further Reading

UNION is only one of many ways of joining queries and results from different tables. Other similar SQL operators that should be explored are INTERSECT and MINUS. Many of these kinds of fundamental techniques are described in detail in Practical SQL Skills from Beginner to Intermediate.