Make Your SQL Queries Compact With SQL IN

Article Summary

The SQL `IN` operator is a logical operator that filters rows by checking whether a column value matches any item in a specified list. This article covers basic syntax, replacing multiple OR conditions, subqueries, NOT IN, and common errors. You'll write cleaner, more readable SQL queries.

The SQL IN keyword, which we consider to be one of the most useful SQL commands, allows you to specify multiple values in the WHERE clause. It is shorthand for adding multiple OR conditions. Because the SQL IN operator supports a list, there is a lot it can do that you may not realize when you first encounter it.

First, let’s look at a normal SQL query to get an idea of how we can use the IN operator to create more advanced queries.

Recommended Udemy course

The Complete SQL Bootcamp: Go from Zero to Hero

Jose Portilla, Pierian Training

A standard SQL query

Let’s say we have a table of users that looks something like this:

user_id	first_name	last_name
1	John	Smith
2	John	Jones
3	John	Roberts
4	Roger	Miller
5	John	Wilson

If you have some SQL experience, you’ll recognize a standard SQL query. The following is a standard select statement: t any relational database with this specific table:

SELECT first_name, last_name FROM users

The above statement selects the columns “first_name” and “last_name” from the SQL table “users.” This will return the first name and last name from every record from our table above. We are limiting the columns that are being retrieved but are not filtering the results with a WHERE clause. This statement is pretty universal in all major SQL platforms such as Oracle, SQL Server, and MySQL. When you start to create more complex statements is when the language changes.

The above statement displays all accounts from the user’s table. The following SQL statement filters users based on the WHERE clause:

SELECT first_name, last_name FROM users WHERE first_name = 'john'

The above statement adds the WHERE filter and only returns accounts where the user’s first name is “John.” Notice the lowercase “j” in the query. SQL is not case sensitive, so the above statement will return any user with “John,” “john,” or “JoHn” in the first_name column.

Filtering record sets is normal in a SQL database. Your tables can grow to millions of records. Returning all of these records to your code and then filtering is a mistake. You should always filter your record set at the database-level to improve performance on your web applications.

But what if you only wanted the first three “John” records. This table may not be realistic, but you will run into similar SQL problems, so bear with me. There are multiple ways you can do this with SQL, but let’s look at one that uses multiple OR clauses like the one below:

SELECT first_name, last_name FROM users WHERE (first_name = 'john' AND userid = 1) OR (first_name = 'john' AND userid = 2) OR (first_name = 'john' AND userid = 3)

I get that for our table, you could get the same results with the following query:

SELECT first_name, last_name FROM users WHERE userid > 4

But, let’s just say you had to run a complicated OR query like the one above. You will run into the situation. When you do, there is a much better way to handle it with the IN operator. Let’s look at what the IN operator does.

What is the SQL IN operator?

The SQL IN operator is a logical operator that tests whether a specific value in the WHERE clause of an SQL query matches any value in the list contained by it. Here is the official SQL syntax of the IN operator:

column | expression IN ( v1, v2, v3, ...)

What this means is that you must first specify the column or value to test and then specify the list of values to test against. All the values in the list must have the same type as the column or expression the values are being compared against.

This is a logical operator, so if any comparison of the original value against a value in the IN clause returns true, then the result of the IN operator is also true.

Like we said in the beginning, the IN operator is equal to multiple OR operators, replacing an OR clause for each item in the list. We separated the list of items with commas, so the following two SQL predicates are equal:

userid IN (1,2,3)

userid = 1 OR userid = 2 OR userid = 3

Now that we have looked at a basic SQL query and how the IN operator looks, let’s look at how we can use the IN operator for more advanced SQL queries, starting with using a simple list of values instead of multiple ORs. We will rewrite that complicated nested OR statement version of the SQL with a much more streamlined version.

Replacing multiple ORs with an IN

Before we get to creating a complete subquery using the IN keyword, let’s start with a list of values. That is what the IN keyword is looking for. A list of values that are of the same type as the column in the WHERE clause. So the initial search for our users used this query:

SELECT first_name, last_name FROM users WHERE first_name = 'john'

And here is a cleaned-up version of the SQL we wrote to limit the records by the userid. SQL is pretty flexible and will just let us use a complicated query as we did with all the duplication as long as it can execute it. It turns out that we didn’t have to do that. The following query will return the same result set as the one before, with much less typing.

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND (userid = 1 OR userid = 2 OR userid = 3)

The above statement says to return all users whose first name is john, and the userid column is 1, 2, or 3. But we can do much better than that. It is an improvement on the original query, but using all those ORs still makes the SQL redundant and hard to read and understand.

The IN statement makes it much easier than creating an or statement in the WHERE clause. We can clean this up a lot with the IN statement, like in the following query:

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid IN (1, 2, 3)

Obviously, the IN statement is much more efficient and easier to use. SQL is flexible and will let you write any SQL statement that doesn’t throw a syntax error, but that doesn’t mean your query can’t be better. The IN operator is something that will have you rethinking how you can rewrite your SQL queries.

You also have the option of using a full subquery in the IN statement. This use of the IN statement works if you know the list of items you want to add before you start. But sometimes, you may have to query for the list of items you will use in the IN statement. Let’s take a look at how we can do that.

Using the SQL IN statement with a subquery

Subqueries are queries within a query. When you do a lot of heavy-duty SQL queries, there will be times when you need to use a subquery. The IN statement can also be used to tell the database to find records that are in the subquery by returning a list of results that match the type of the column in the WHERE clause. While it’s not always the best for performance, you might need to rewrite, maintain, or add to an existing subquery that uses the IN statement.

When you run a SQL statement with a subquery, the system first executes the subquery. The subquery returns a record set, and then the main query runs a statement against the subquery’s record set. So plan the subquery you use in the IN clause to match the type of the column you are trying to match. Subqueries will add convenience but will also add complexity.

The following code shows you how to create a full subquery in the SQL that returns the same results as the last two queries we looked at:

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid IN (SELECT userid from users WHERE userid = 1 OR userid = 2 OR userid = 3)

The SQL statements we just reviewed all return the same data set. However, the last one uses a subquery to return a list of users with an id of 1, 2, or 3. The last one is the most inefficient way to create a query. The first two do not run a subquery against the database tables. The last one first runs the inner SELECT statement, returns a list of users with an id of 1, 2, or 3, and then the main query (outside query) runs. This means that the database must execute two queries instead of one. With too many of these types of statements, you can technically slow down your application. While one might not create a noticeable difference, too many of these statements will slow down the system. At some point, you will have to refactor your SQL queries.

The last query was not efficient because we were still manually searching for the userids anyway in our subquery. The subquery was useless and just involved a table in the query that did not need to be included. But if we need to find all the users named john who are admins on the system and we need to retrieve those userids from another table, the IN clause gives this a way to do this dynamically. So even though it would be more efficient to just have a list of ids in the IN clause, it is not feasible in this case because users that are admins may change.

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid IN (SELECT userid from roles WHERE role = 'admin')

Notice that everywhere we use a subquery, it returns the “userid” column. This is important when using the IN statement. The statement says “WHERE userid IN,” which means that the SQL database server will look for all matching user IDs in the subquery. If you don’t return the userid in the subquery, the SQL statement will return an error. For example, the following code will throw an error if you try to run it in SQL:

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid IN (SELECT userid FROM users WHERE userid = 1 OR userid = 2 OR userid = 3)

In the examples above, we used integers in the IN clauses that we were using integers in the IN clause. You can also use SQL strings in the IN statement. This query is similar to the userid subquery, but it uses strings instead of integer values. The same rules apply to strings as they do in other SQL queries. You will most likely have to surround the strings in your list with single quotes, depending on the SQL engine. The following code is an example of using the IN statement with string values:

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND last_name IN ('smith', 'jones')

In the above statement, all records with a first name of “John” are returned but only if the user’s last name is “Smith” or “Jones.”

While it’s not technically efficient, you can also embed an IN statement within the subquery. The following query returns the same records as the above statements, except the IN statement is in the subquery:

SELECT first_name, last_name FROM users WHERE first_name = 'john' and userid in (SELECT userid FROM users WHERE userid IN (1,2,3))

Using NOT with IN

You don’t have to use IN all by itself. We have already seen how you can nest an IN statement inside of another IN statement if you need to, though it is not recommended for performance if you can avoid it. Now we are going to look at how you can use the negate operator, NOT, with the IN operator in the NOT IN combination.

Here is the syntax of using the NOT operator with the IN operator:

column | expression NOT IN ( v1, v2, v3, …)

The NOT IN operator will return TRUE if the column or an expression representing a column is not equal to any of the values in the list. An example, with the users table that we have been using for our examples, would be the following query:

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid NOT IN (4, 5)

That just used a list of values, but we can also use a full subquery to replace this list of values.

SELECT first_name, last_name FROM users WHERE first_name = 'john' AND userid NOT IN (SELECT userid from roles WHERE role = 'admin')

IN statement errors

In Microsoft SQL Server, using an extremely large list of values in your IN statement — or a lot of values separated by commas — can use up a lot of resources. This will sometimes cause an 8623 or 8632 error.

The 8623 error will look like this:

The query processor ran out of internal resources and could not produce a query plan. This is a rare event and only expected for extremely complex queries or queries that reference a very large number of tables or partitions. Please simplify the query. If you believe you have received this message in error, contact Customer Support Services for more information.

An 8632 error will look like this:

Internal error: An expression services limit has been reached. Please look for potentially complex expressions in your query, and try to simplify them.

The suggested fix for this is to retrieve the values you will use with the IN clause from a database table by using a SELECT subquery in the IN clause, as we saw in the examples above.

Also, having NULL values in the list used in an IN clause return UNKNOWN rather than TRUE or FALSE when you test them. This can have unexpected results, so it is best to clean your list of values of any NULL values.

The secret of joins in many ORMs

An object-relational mapper (ORM) is a library in a specific programming language that makes connecting to and querying relational databases much more simple. You don’t have to write SQL queries; just use objects in the programming language. An example of this using our users table above would be if users had posts, so we had a post table related to our user table. An example query in SQL would be:

SELECT content FROM posts JOIN user ON users.userid = post.user_id WHERE users.first_name = 'john'

An ORM would allow us to do something like this:

user->posts();

But behind the scenes, many ORMs don’t write the query we would write, but something like this:

SELECT content FROM posts WHERE userid IN (SELECT userid FROM users WHERE first_name = 'john')

Conclusion

While the IN statement isn’t the most preferred way to query the database, it’s still useful in some scenarios. In most cases, you can rewrite a query to be more efficient. If you have code review from peers or even a database administrator, they might even have issues with an IN statement. However, this type of statement is more efficient than having several OR statements because it’s easier to read, and programmers prefer it over long, hard-to-read statements. Even if it’s too hard to read, it still doesn’t mean it’s not worthwhile to reprogram. However, it’s always beneficial to understand how to read and write IN statements.Now that you understand IN statements, let’s move on to the SQL count command.

SQL IN: Make Your SQL Queries Compact With the IN Operator