Subqueries are “queries within a query.” When you do a lot of heavy-duty SQL queries, there will be times when you need to use a subquery. One keyword used to search for records within a recordset (subquery) is using the SQL “in” statement. The “in” statement basically says “find records that are ‘in’ this subquery.” While it’s not always the best for performance, you might need to rewrite, maintain or add to an existing subquery that uses the “in” statement.
A Normal SQL Query
If you have a little bit of SQL experience, you’ll recognize a “normal” query. The following is a standard select statement:
select first_name, last_name from users
The above statement selects the columns “first_name” and “last_name” from the SQL table “users.” The above statement is pretty universal in all major SQL platforms such as Oracle, SQL Server and MySQL. When you start to create more complex statements is when the language changes.
The above statement displays all accounts from the users table. The following SQL statement filters users based on the “where” clause:
select first_name, last_name from users where first_name=’john’
The above statement adds the where filter and only returns accounts where the user’s first name is “John.” Notice the lowercase “J” in the query. SQL is not case sensitive, so the above statement will return any user with “John,” “john,” or “JoHn” in the first name column. Filtering record sets is normal in a SQL database. Your tables can grow to millions of records. Returning all of these records to your code and then filtering is a mistake. You should always filter your record set at the database-level to improve performance on your web applications.
Using the SQL “In” Statement
Before you write an “in” statement, you should understand the concept of subqueries. Subqueries, as statement before, are “queries within a query.” When you run a SQL statement with a subquery, the system first executes the subquery. The subquery returns a record set, and then the main query runs a statement against the subquery’s record set.
The following is an example of the SQL “in” statement:
select first_name, last_name from users where first_name=’john’ and userid in (1, 2, 3)
The above statement says “return all users where the first name is ‘john’ and the userid column is either 1, 2 or 3.” The “in” statement makes it much easier than creating an or statement in the where clause. The following SQL statement is an alternative way to write the above statement:
select first_name, last_name from users where first_name=’john’ and (userid = 1 or userid = 2 or userid = 3)
Obviously, the “in” statement is much more efficient and easier to use. You also have the option of using a full subquery in the “in” statement. The following code shows you how to create a full subquery in the SQL in statement:
select first_name, last_name from users where first_name=’john’ and userid in (select userid from users where userid = 1 or userid = 2 or userid = 3)
The above three statements all return the same data set. However, the last one uses a subquery to return a list of users with an id of 1, 2 or 3. The last one is the most inefficient way to create a query. The first two do not run a subquery against the database tables. The last one first runs the inner select statement, returns a list of users with an id of 1, 2 or 3 and then the main query (outside query) runs. This means that the database must execute two queries instead of one. With too many of these types of statements, you can technically slow down your application. While one might not create a noticeable difference, too many of these statements will slow down the system to where you will need to reprogram your SQL statements.
Notice that the subquery returns the “userid” column. This is important when using the “in” statement. The statement says “where userid in,” which means that the SQL database server will look for all matching user IDs in the subquery. If you don’t return the userid in the subquery, the SQL statement will return an error. For example, the following code will throw an error if you try to run it in SQL:
select first_name, last_name from users where first_name=’john’ and userid in (select first_name from users where userid = 1 or userid = 2 or userid = 3)
You can also use SQL strings in the “in” statement. This is similar to the userid subquery, but it uses strings instead of integer values. The following code is an example of using the “in” statement with string values:
select first_name, last_name from users where first_name=’john’ and last_name in (‘smith’, ‘jones’)
In the above statement, all records with a first name of “John” is returned but only if the users’ last name is “Smith” or “Jones.”
While it’s not technically efficient, you can also embed an “in” statement within the subquery. The following query returns the same records as the above statements, except the “in” statement is in the subquery:
select first_name, last_name from users where first_name=’john’ and userid in (select user from users where userid in (1,2,3))
While the “in” statement isn’t the most preferred way to query the database, it’s still useful in some scenarios. In most cases, you can rewrite a query to be more efficient. If you have code review from peers or even a database administrator, you might have issues with an “in” statement. However, this type of statement is better than having several “or” statements, because it’s easier to read and programmers prefer it over long, hard-to-read statements. Even if it’s too hard to read, it still doesn’t mean it’s not worthwhile to reprogram. However, it’s still beneficial to understand how to read and write “in” statements.