The Secrets of the SQL CONTAINS Command
With the right SQL statement, databases are the perfect tool for any data-specific task. Using your SQL queries, you can combine your data into useful pieces of information after populating your database tables. WHERE clauses, for example, allow you to input filters and other search criteria so you can create a customized search solution. Nevertheless, there may be times when you only have a portion of the information you need. Ordinarily, logical and equality statements in SQL only deal with whole objects. You can use the CONTAINS SQL to test your database against substrings.
The SQL CONTAINS function for SQL server
To begin, you must realize that the CONTAINS SQL function isn’t a standard SQL function. Although every database system has this function, they all handle it differently. The function also requires different arguments depending on the database. So, before you use it, you need to know and understand the database system you’re using.
In Microsoft SQL Server, CONTAINS SQL checks to see if one string is a substring of another string. For Microsoft SQL Server and similar systems, CONTAINS lets you do full-text term pattern-matching queries on your tables.
The most common use of the keyword CONTAINS requires two arguments and returns a number with either a true or false result. It will be 1 (true) if it finds a match and 0 (false) if it doesn’t. The first argument is the name of the table column you want to be searched. The second argument is the substring you need. The basic syntax is as follows:
SELECT columnName FROM yourTable WHERE CONTAINS ( columnName, 'yourSubstring' );
From the example above, you should note that the column name isn’t enclosed in quotes, but the arguments for the CONTAINS SQL function are enclosed in parentheses. The substring we are searching for is in quotes, as quotes must surround all strings in SQL Server.
You can use the CONTAINS function in any SQL query that you can use the WHERE clause in, though it is the most useful within a SELECT statement. With SELECT, CONTAINS returns all records where the inputted column contains the substring. For example, the following sample only returns records from New York City that contain the top ten phrases containing words with the letter “a” in them:
SELECT DISTINCT TOP 10 phrase AS Result FROM phrases_table WHERE CITY = 'New York City' AND CONTAINS ( phrase, 'a*' );
|23345||A Group Creation|
|67872||Silver Hut A House of Jewellery|
|36274||DOEACC A Level Course|
|58444||A V Jewellers|
|11234||A One Tent House|
|37098||The Azad A Complete Wedding Planner|
|60636||A One Beauty Parlour|
|74257||A One Band|
|76122||D A International Travels|
The * in the substring argument is a wildcard placeholder representing all the other characters in a word. We do this because, without it, SQL Server wouldn’t return anything. SQL Server treats the substring as a complete word. If we just used “a,” SQL would ignore it, as “a” is one of the noise words SQL Server ignores, along with “the” and “an.” Often called particles, noise words are defined by the specified language. Because they show up so frequently, SQL Server ignores them all, along with punctuation marks. You can check which words SQL Server considers noise by looking at the noise file in your SQL Server’s installation directory.
In the last example, we only searched for a single substring. However, you can include as many substrings in the argument as you need. You have to enclose each additional substring in double quotes and separate it by a logical operator. For example, the following query returns all documents that have either “replacing” or “pedals” in their summary:
SELECT DocID, DocSummary FROM documents WHERE CONTAINS ( DocSummary, ‘"replacing" OR "pedals"’ );
This method searches for substrings of query words in their entirety, very similar to what it did before. Synonyms and inflections are not considered. Because we didn’t include wildcards, it won’t look for these terms inside other words. The above example used the OR operator to return results that contain either word, but substituting AND for OR could be possible if including both words is necessary.
In SQL Server, you can also use the NEAR function in combination with CONTAINS if you want to find records where one word or phrase is near another word or phrase. Here is an example using NEAR:
SELECT student_id, student_name FROM students WHERE CONTAINS( address, 'NEAR( Montana, Billings )' );
This query will return all the students whose address contains the word “Billings” near the word “Montana”.
You can also use the ISABOUT function along with the CONTAINS function to add weight to each term you are searching for. For example:
SELECT * FROM students WHERE CONTAINS( address, 'ISABOUT (Montana weight(.8), Minnesota weight(.4))' );
This query will return results along with a weighting rank based on the weights that are set for specific words in the ISABOUT function. Results with “Montana” in the address column will rank higher than those with “Minnesota” in them, but we will still get all of the results. This type of query works well for writing custom result ranking algorithms.
Another SQL predict that works with CONTAINS in SQL Server is FORMSOF. Using FORMSOF, you can return results that not only contain the term you are searching for but also variations of it. Here is an example of that type of query:
SELECT student_id, student_name FROM students WHERE CONTAINS( hobbies, 'FORMSOF(INFLECTIONAL, photograph)' );
The first parameter of FORMSOF is the type which is always set to INFLECTIONAL. The second parameter is the term you are searching for. In the example above, the query will find all students who have any one of the following words in the hobbies column: “photograph”, “photographs”, “photography”, and “photographer”.
That’s all you need to know about using CONTAINS in SQL statements with SQL Server. There are other databases that use the keyword as well, but each one does so in a unique way.
The SQL CONTAINS function for Oracle database
With the Oracle Database System, CONTAINS works much like the SQL Server version, but you can also use an additional argument if you need. The basic syntax looks like this:
CONTAINS ( column_name, substring, label );
The column_name and substring parameters are the same as they are with SQL Server. Column_name is the column you are searching and substring is the string you are searching for. Label must be a number, and it represents the score for the CONTAINS function. It is optional, except when you have to use CONTAINS more than once in a query. If you use the SCORE function, it must refer to this value.
Here is an example of the query without the optional parameter:
SELECT student_id, student_name FROM students WHERE CONTAINS( address, 'Montana' );
In this database, the whole address is in one text field, and we are trying to find those students with an address in Montana.
You can also use an asterisk in a CONTAINS function in Oracle, like the following:
SELECT student_id, student_name FROM students WHERE CONTAINS( address, 'Mo*' );
This query will return all the addresses that have a word that starts with “Mo”.
In fact, you can modify an Oracle CONTAINS SQL query in many of the ways you can modify the same type of query in SQL Server, including using NEAR, IS_ABOUT, and FORMS_OF. So, for the most part, Oracle’s version of CONTAINS is similar to SQL Server’s.
The big difference is with the optional third label parameter. This parameter works in conjunction with the SCORE function and allows you to add a score to the results of your queries. Here is an example of that type of query:
SELECT SCORE（1), student_id from students WHERE CONTAINS（ address，'Montana'，1）> 0；
This query returns all results that have “Montana” in the address along its score. We set the score in the label parameter of the CONTAINS function.
Another feature that Oracle has but SQL Server doesn’t is the ability to use ABOUT in a CONTAINS query. Here is a query using ABOUT:
SELECT student_id, student_name FROM students WHERE CONTAINS( hobbies, 'ABOUT(sports)' );
An ABOUT query will increase the number of related records returned by it. ABOUT queries used the indexed topic component and return results based on the concept in the query, not just on the literal word. So the example query above would return student records who had hobbies that were sports.
The SQL CONTAINS function for MySQL database
MySQL only recognizes the CONTAINS SQL function when dealing with spatial data. It requires two graphics objects as arguments, and returns a 1 or 0 depending on if the first object completely contains the second. Designed as an implementation of the OpenGIS framework, the MySQL CONTAINS function does not work on ordinary strings and will produce an error if you try it. MySQL only recognizes the LIKE and STRCMP functions when dealing with strings of information.
Alternatives to the SQL CONTAINS function
If you want the same functionality that CONTAINS gives you in a database that doesn’t support the function or, as in the case with MySQL, doesn’t even do the same thing, then you will have to find an alternative. Here are a few SQL functions you can use instead of CONTAINS.
The SQL LIKE operator is a logical operator that can find out if one character string matches a specific pattern. It is commonly used with a WHERE CLAUSE to find a specific pattern in the column of a database. It functions quite similar to the way CONTAINS does and you will find it available in most databases. But its use and syntax are very different.
While CONTAINS can search for words in text content, LIKE searches for patterns. If you are searching for a word in the middle of text with LIKE, you will have to use a wildcard character on each side of the word you are searching for. But the wildcard character for LIKE is the percentage symbol, or “%”, and not an asterisk like it is in a query using CONTAINS. Here is an example of a query using LIKE:
SELECT student_id, student_name FROM students WHERE address LIKE 'Montana';
But this query would only return students whose address column only contained “Montana” and nothing more, so more than likely it would return nothing. Since Montana is likely to show up in the middle of an address, we would have to use a wildcard character on both sides of it in the query, like this:
SELECT student_id, student_name FROM students WHERE address LIKE '%Montana%';
This will return results and is the closest we can get to the functionality of the CONTAINS function. The percentage sign represents any number of characters. Another character you can use in a LIKE query is the underscore character, or “_”. This character represents any one character. So this query will return every student who has a four-letter name starting with “J”:
SELECT student_id, student_name FROM students WHERE student_name LIKE 'J___';
The SQL CHARINDEX function finds the position of one string or expression in another string. Instead of returning a string, it returns the index of where the substring was found in the string. So to use it like the CONTAINS function, you will have to use it differently to get it to work the same way. But first, let’s look at its syntax:
CHARINDEX ( expression_to_find , expression_to_search [ , start_location ] )
The first parameter is the substring we are looking for. The second parameter is the column we are going to search. And the third parameter is optional and is for the position we want to start searching for the substring in the string. If you don’t enter this last parameter, it will default to 0, and the search will begin from the beginning of the searched value.
Here is the same student query now using the CHARINDEX function:
SELECT student_id, student_name FROM students WHERE CHARINDEX( 'Montana', address ) > -1;
Since CHARINDEX returns an index of where the word we are searching for is located, we can use that index to determine if the word exists in the address column. If it is 0 or higher, it does.
LOCATE is a function in MySQL and other databases. LOCATE returns the position of the first occurrence of a string inside of another string. In that way, it is very similar to the CHARINDEX function. Here is how it works:
LOCATE ( substring, string, start)
Also, since this function returns the index or position of the substring we are searching for, we will have to use it differently than the CONTAINS function:
SELECT student_id, student_name FROM students WHERE LOCATE ( 'Montana', address ) > -1;
The INSTR function is specific to Oracle/PL and MySQL that returns the starting position of one string in a given input string. It is very similar to both CHARINDEX and LOCATE. Here is its syntax:
INSTR ( expression_to_search, expression_to_find )
The only difference is that the parameters are in a different order, and the search phrase is the first parameter. Here is an example of a query using INSTR:
SELECT student_id, student_name FROM students WHERE INSTR( address, 'Montana' ) > -1;
The SUBSTRING SQL command is similar to both INSTR and CHARINDEX and is specific to the PostgreSQL database engine, though its syntax is different. Here is that syntax:
SUBSTRING( string [from starting_position] [for length] )
With this function, we would have to know exactly where we are looking for a substring inside of a string. For example, if we wanted results where a student’s first name was John, we could use this query:
SELECT student_id, student_name FROM students WHERE SUBSTRING( student_name, 0, 4 ) = 'John';
While SQL does have a standard, each database vendor decides which features to include in their version, and sometimes they add more features. The SQL CONTAINS function is one of those additional SQL commands. You will find it most commonly used in Oracle and SQL Server databases, though other database engines may also have the function available. We recommend checking to see if your system understands the CONTAINS function before using it in your queries. If you can use CONTAINS, you should use it as often as possible. It’s much faster than LIKE, and self-validating. It’s the perfect solution when you have to search through thousands of records for particular words or phrases, but no worries if it doesn’t use it. There are plenty of alternatives to choose from, including LIKE, CHARINDEX, and LOCATE.
Top courses in SQL
SQL students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.