Python Substring Tutorial

python substringHave you come to Python after programming in another language? If so, you may have run into a wall when you looked for the substring method that is found in those other languages. Surprise! There is no Python substring method. Not to worry, Python provides some powerful ways to work with substrings. We’ll show you how they work here. For other differences in the Python language, Pythonic Python is a good course for a programmer already familiar with another programming language.

A Quick Review

First, a quick introduction and review of strings in Python. For a more thorough review and introduction to Python, try this course for beginners. Read this article for a tutorial on strings. In Python, a string is a series of alphanumeric characters. The most common ways to write them are in single or double quotes.

u = 'Udemy courses are great!'

Strings are indexed by an offset from the left, starting with 0 for the first character. The last index is always one less than the length of the string. Our sample string has a length of 24, so the index values go from 0 to 23. For this tutorial we will demonstrate the syntax as if it were entered in the IDLE shell.

>>> u = 'Udemy courses are great!'           #you type at the >>> prompt
>>> print (u[0])
U           # this is the response from Python
>>> print (u[23])
!

As you can see, indexes are specified in square brackets, and the string goes from offset 0 to offset 23. Python allows the unique trick of specifying a negative offset, which means to count from the right. Offset -1 is the last character in the string.

>>> u = 'Udemy courses are great!'
>>> print (u[-1])
!

Python Substrings: Slicing

In Python, substrings are accessed with a form of indexing known as slicing. Two index numbers are provided separated by a colon. If we want the substring ‘courses’, we can use slicing:

>>> u = 'Udemy courses are great!'
>>> print (u[6:13])
courses

Notice that we specify the offset of the first character we want, as expected , but we specify one offset beyond the last character we want. The slice u[a:b] returns the substring from index a to index (b-1). So, if we want to get all of the string from position 6 to the end, we could specify:

>>> u = 'Udemy courses are great!'
>>> print (u[6:24])
courses are great!

We have to use 24 even though the index of the last character is 23 in this case.

Actually, you would probably use the default notation. The slice u[a:b] has the following default variations:

>>> u = 'Udemy courses are great!'
>>> print (u[:5])
Udemy           #leaving out the first index will default to the beginning of the string
>>> print (u[6:])
courses are great!           #leaving out the second index will default to the end of the string
>>> print (u[:])
Udemy courses are great!           #leaving out both indexes gives you the entire string

There is a third index you can put into a slice, u[a:b:c] This is the step, or stride. It will give a substring starting at a and containing every cth character:

>>> u = 'Udemy courses are great!'
>>> print (u[::3])
Umcrsrga

In this case, a and b were omitted, so it used the whole string and returned every 3rd character.

Slicing can be very powerful when reading data from a file where every line has the same structure. Suppose we have a text file where every line looks like this:

xxxxddddxxxxxxddxx

The x’s are arbitrary characters and we just want whatever characters are at the positions of the d’s. We can use slicing to extract these substrings:

>>> dstring = 'xxxxddddxxxxxxddxx'
>>> data1 = dstring[4:8]
>>> data2 = dstring[14:16]
>>> print (data1)
dddd
>>> print (data2)
dd

If those characters were numerals, and you need numeric data, simply use the float() or int() method:

>>> dstring = 'xxxx1234xxxxxx56xx'
>>> data1 = float(dstring[4:8])           #creates data1 as a floating-point number
>>> data2 = int(dstring[14:16])           # creates data2 as an integer number

Finally, it is worth pointing out that none of these slice operations changes the value of the original string.

These samples work when you know the exact layout of the string. What if the data you want could be anywhere in the string? The Python find() method will work here. With find(), you can specify a substring which will be located. Python will return the index where the substring begins. In our first example, we can find where the substring ‘great’ begins:

>>> u = 'Udemy courses are great!'
>>> u.find('great')
18

OK, so now we know that the substring ‘great’ starts at index 18. Now we can get the substring with a slice. We assign the index returned by the find to a variable. We can use variables in a slice to compute an index. In this case we know that the substring we want is 5 characters long, so (i+5) will take us one place beyond the end of what we want. Remember that the second index in a slice has to be one place higher than where we want to end.

>>> u = 'Udemy courses are great!'
>>> i = u.find('great') # i now contains the value 18
>>> sub1 = u[i:i+5]
>>> print(sub1)
great

What if we don’t know if the substring we want is in our string? There are 2 choices here. First, find() returns -1 if the substring is not found. In a Python script, we can check this value first and proceed only if -1 is not returned:

u = 'Udemy courses are great!'
i = u.find('great')
if i != -1:
     #this section executes if the substring is found
     sub1 = u[i:i+5]
     print(sub1)
else:
     #this will execute if the substring is not found
     print('Substring not found')

The second choice is to use the in operator. This will return a true or false value.

u = 'Udemy courses are great!'
if 'great' in u           #the expression is evaluated as true or false
    #this section executes if the substring is found: the expression is true
    i = u.find('great')
else:
    #this will execute if the substring is not found: the expression is false
    print('Substring not found')

For more on if statements, you can view this training course.

This should get you started on Python substrings. Try this ultimate course to master more skills.