SQL NOT EXISTS: Find Unmatched Records

Article Summary

SQL NOT EXISTS is a clause that returns rows where no matching record exists in a subquery. This article covers syntax, real-world use cases, comparisons with JOIN and NOT IN, and optimization tips. You'll leave knowing when and how to apply SQL NOT EXISTS effectively.

Programming code displayed on two computer monitors.

SQL is a logical language, so everything you do in SQL will perform in a predictable way — including SQL NOT EXISTS. You query the database with a logical determinant. The database returns the data.

Today, we’ll be looking at the EXISTS operator and the SQL NOT EXISTS function.

You’ll likely find that the SQL NOT EXISTS function is actually pretty simple once you get used to formatting an EXISTS subquery. When you use SQL NOT EXISTS, queries will return all the documents that don’t match the given “exists” subquery.

But the easiest way to understand this type of SQL query is to see how it’s used to select distinct data throughout the database.

When would you use SQL NOT EXISTS?

Let’s draw out a simple example. We have a customer database that includes a list of customers. We also have another table that includes orders. We use the following SQL:

select * from customers where NOT EXISTS (select customerID from orders)

Recommended Udemy course

The Complete SQL Bootcamp: Go from Zero to Hero

Jose Portilla, Pierian Training

What does the above do?

It selects all data from the “customers” table where the customers.customerID does not exist in the orders.customerID. It has a couple dependencies, of course: It requires that your customerID in your customers’ table is intended to match up with your customerID in your orders table. If you need to use another foreign key, it gets a little complicated.

In short, the above query and subquery return any rows in which a customer has not made an order.

The most important thing to recognize is that SQL NOT EXISTS involves two parts:

The primary query, which is the “select * from customers where.”
The secondary query, which is the (“select customerID from orders”)

NOT EXISTS goes after the “WHERE” condition.

The SQL-savvy will notice that, in terms of resources, this is usually a resource-intensive job. Not only does the query need to see which turns true and false (which rows don’t exist), but you’re essentially creating a join. A join, whether left join, outer join, or inner join, takes power.

In fact, you’ve discovered the reason why we use EXISTS and NOT EXISTS. It’s less resource intensive than performing regular joins. Because it checks first to see whether the record even exists, it doesn’t do a full outer join or full inner join.

So, we use EXISTS and NOT EXISTS for clarity and when we want to preserve the system’s resources. This is incredibly important for applications that need to be able to scale.

What if we wanted to use EXISTS instead?

select * from customers where EXISTS (select customerID from orders)

Now we’re selecting all customers who have ordered from us in the past. That’s a great thing for a remarketing campaign. To make it a little more useful, we could also:

select customers.totalSpend where EXISTS (select customerID from orders) order by totalSpend desc

The above would produce a list of customers by the amount of revenue they generated for us. You can still use things like “order by” — even with the exists subquery — and the queries will return as normal from the SQL server.

So, the general syntax is:

select [selected information] from [table] where NOT EXISTS [subquery]

It’s the subquery that’s the important part, as this is the logical operator that either returns null or true. With NOT EXISTS, it’s true if the subquery is not met. With EXISTS, it’s true if the query is met. So, EXISTS and NOT EXISTS can really do the exact same things — it’s just a matter of writing which is clearer at the time.

Because EXISTS and NOT EXISTS are a little complicated, you might want to set up a demo database to run queries on, always with the understanding that the debt load is going to be much more significant on larger databases.

Also, note that EXISTS and NOT EXISTS will always be opposing. If you want the exact opposite of a set of returned queries, you just add in the NOT modifier. This is similar to IN and NOT IN.

Can you use NOT EXISTS with other clauses?

What if you want to see all the customers who have ordered recently? NOT EXISTS is just another type of SQL grammar; you can still build your query out the way you would ordinarily. So, consider this:

select * from customers where NOT EXISTS (select customerid from customers where createdate() <= getdate() - 60)

So, this query isn’t selecting customers that don’t exist. Rather, it’s selecting customers that didn’t exist prior to two months ago (60 days ago). This is a fast, easy way to get some information, but as we can see, it’s not always the only way.

That means that the clause that you use inside NOT EXISTS can be rather complex. But you wouldn’t, for instance, want to say something like this:

select * from customers where NOT EXISTS (select customerid from orders order by orderid)

The modifiers to order should go outside, such as:

select * from customers where NOT EXISTS (select customerid from orders) order by orderid

This is because the subquery is only designed to find the rows that do not exist (or in the case of EXISTS, the rows that do exist). It is not intended to do any ordering or formatting that isn’t necessary.

With particularly complicated SQL statements, one of the best ways to understand them is to simply diagram them, much like you would diagram a sentence. This will give you an overview of the results that you can expect.

Using JOIN statements instead of NOT EXISTS

We mentioned that NOT EXISTS is creating a JOIN operation. You can also use join operations, such as:

select * from customers cjoin join orders ojoin on cjoin.customerID=ojoin.customerID where …

This is a perfectly valid way to replace NOT EXISTS and EXISTS. Essentially, this process is replacing the “NOT EXISTS” by outlining exactly what the NOT EXISTS function would do.

There are advantages to using this method. You have more granular control over the completion of the JOIN function and its optimization.

But there are also downsides. JOIN statements and NOT EXISTS/EXISTS statements will inherently handle null values differently. NOT EXIST will err on the side of including these null values, and JOIN will not.

An issue with JOIN commands is that they aren’t extremely readable. Except to the very advanced programmers, JOIN commands tend to be fairly impenetrable. You can improve upon this by using easy-to-read names, but either way, it’s not always going to be immediately intuitive what JOIN is doing.

NOT EXISTS/EXISTS is easier to read and easier to scan. There also exists the compromise of using a NOT EXISTS/EXISTS function when writing code and then optimizing it into a specialty JOIN function later.

How do you optimize NOT EXISTS?

Despite the resource benefits, you should treat NOT EXISTS like other SQL join commands. Essentially, you should be wary of connecting more than two tables at a time, and you should make sure that you aren’t referencing more data than you strictly need.

If possible, you may want to limit your queries to a given row; for instance, you might want to pull only 50 queries in an application and then allow the user to continue browsing. This is a conventional way of reducing load and becomes important, especially as the application starts to scale.

An example:

select * from customers where NOT EXISTS (select customerid from orders) limit 50

This would only pull 50 records at a time. Like ORDER BY, you can also apply LIMIT to a NOT EXISTS query. But again, you would want to put it outside of the parentheses, not inside of them.

What are some alternatives to SQL NOT EXISTS?

There are some ways that you can get around SQL NOT EXISTS. For instance, in the following example:

select * from customers where NOT EXISTS (select customerid from customers where createdate() <= getdate() - 60)

In this example, what you’re trying to do is select customers that are older than 60 days. And you can do that very simply:

select * from customers where createdate()+60 < getdate()

That looks a lot better, doesn’t it?

There are situations in which you can use a simple WHERE clause rather than NOT EXISTS. The issue is, as mentioned before, that it does require additional resources. The NOT EXISTS function is going to be more sustainable.

Of course, it’s not as simple for the other example:

select * from customers where NOT EXISTS (select customerID from orders)

This is an example in which it would be very hard to replicate the NOT EXISTS function without using EXISTS because you need to see which rows do not exist inside the given set.

You might think that you could use:

select * from customers, orders where customers.customerID != orders.customerID

But this isn’t going to give you what you want because it’s going to be giving you each row that does not match up, and every row is going to fail to match up at least once.

Once you understand why and when SQL NOT EXISTS works, you’ll see more examples of when it’s not only the best option, but the only option.

What’s the difference between SQL NOT EXISTS and SQL IN?

If you’ve heard of the SQL IN command, you might wonder what the logical difference is between SQL NOT EXISTS and SQL IN.

In some cases, SQL NOT EXISTS and SQL NOT IN (and SQL EXISTS and SQL IN) can have the same purpose. But with an example, you can see how the emphasis is different.

Here’s an example of using the SQL IN command:

select * from customers where country NOT IN (‘United States’)

The above would select customers from the table that are not in the United States. In short, SQL NOT EXISTS is generally used to detect where a row does not exist. SQL IN is generally used to detect where a value does not exist within a record.

Another difference is in how it treats nulls. And there’s also a question of resource management, as SQL NOT EXISTS has specific ways to handle joining an outer query.

Both SQL NOT EXISTS and SQL NOT IN are excellent tools for what they do — just remember that they serve different purposes.

Learning more about SQL NOT EXISTS

If you’re looking to brush up on your SQL skills, you might want to consider taking a class in SQL. You will definitely want to download a SQL database like MySQL and play around with it — it’s one of the best ways to learn.

SQL NOT EXISTS: Discovering Records That Don’t Match Your Query