Working with Text: C String Concatenation

c string concatenationNo matter what programming language you use, when you start to build real-world applications you will inevitably end up dealing with text. A lot of modern languages have built-in facilities for dealing with text data (commonly known as “strings”), but when you’re working with the C language, you have to go through a few more steps and do a little more memory management in order to handle strings properly. (Need a refresher on C? Follow a beginner tutorial.) It’s not that hard, though. You just have to pay attention to what you’re doing.

One of the most common string operations in any language is the act of concatenation. This means being able to add on the contents of one string to the end of another. There are countless practical applications for this, but the basic concept of being able to progressively build up a string of text is very valuable, whether you’re writing a basic console application or a full-fledged GUI.

Overview

We’ll start with an overview of how strings work in C. The language itself has very limited support for strings; it mostly allows you to specify text data in code, and relies on library functions to perform operations on them.

These strings come in the form of ordinary character arrays–that is, contiguous sequences of characters. These sequences of characters are then terminated by the “NULL” character (the character whose ASCII value is zero), which is what designates it as a string as opposed to a random block of characters. This terminator tells the programmer where the string of text ends, so that it’s clear where a loop should stop processing since text data can be pretty much any length.

In code, anything enclosed within double quotes (“like this”) results in a C string expression–an array of characters followed by the NULL terminator. Take this classic example:

“Hello, World!”

This expression in C results in a pointer to a character array containing the above text message, which is composed of all 13 individual characters, followed by an implicit NULL terminator. Since the expression returns a pointer to the first character, you can store the memory location of this message like so:

char *myMessage = “Hello, World!”;

If you declare it as above, the contents of the string will be read-only. Any attempts to write to it may result in undefined behavior. Or worse, your program may crash. And since our goal with string concatenation is to combine two or more strings together, we want to be able to write freely to our string. If, instead, we declare it like so:

char myMessage[] = “Hello, World!”;

Then the contents of our string will be writable. The type of “myMessage” in this case is an array of characters with a fixed length of 14 (13 characters plus the NULL terminator), as opposed to a pointer to a character like in the other example above.

Do note that once you declare a string in this fashion, you can later assign a pointer to its memory location and still write to it using pointer operations–this is because the pointer is pointing to a writable location in memory. Before, we were simply telling C to assign a pointer to a fixed set of data. With the array declaration, we’re telling C to allocate some space on the stack for our string.

With all of that established, how do we then combine two or more strings together? Another very important question is about memory space–since the array declaration above calls for a fixed length, how can we combine strings without going out of bounds?

The Strcat Function

To answer the out-of-bounds question, we need to know how to declare an array that specifies a certain length instead of letting C count the characters for us.

char myMessage[256] = “Hello, World!”;

In this declaration, we’re allocating space for exactly 256 characters in memory, the beginning of which contains the 13 characters of our classic phrase plus the NULL terminator at the 14th position. With this extra space after the NULL terminator, we have room to add more data. And fortunately for us, there’s a function in the C standard library that deals specifically with combining strings together. First, make sure that you #include the appropriate header at the top of your source file:

#include <string.h>

The string.h header contains the function we need plus many other related functions. The one we’re focusing on, however, is the “strcat” function, which is defined in the standard library similarly to this:

char *strcat(char *dst, char *src);

Here, we can see that strcat is a function that takes in a pointer to the destination string as well as the source string, and returns a pointer to the resulting combined string. Do note that since we’ll likely be combining one or more source strings into the same destination, we won’t need to use the return value of the strcat function. If you do need it, though, you can simply assign its result to a character pointer.

Now let’s say we have two separate messages we want to put together, declared like so:

char firstMessage[256] = “The quick brown fox”;
 char secondMessage[] = “ jumps over the lazy dog.”;

Take note that we’ve declared extra space in firstMessage. While we don’t need as much as 256 characters in this case, it’s still good practice to allocate a little more space than you need, so that you have some “breathing room” in case you want to concatenate more strings.

Now let’s fuse them together!

strcat(firstMessage, secondMessage);

Well that wasn’t so bad. Let’s see if it worked:

printf(“%s\n”, firstMessage);

We should see the following in the console output:

The quick brown fox jumps over the lazy dog.

Success! But wait… since we know there was a NULL terminator at the end of the first message, and a NULL terminator marks where a string ends, why wasn’t the output simply “The quick brown fox”?

This is due to the magic of the strcat function. This function was designed specifically for dealing with NULL-terminated strings, and it takes into account the fact that multiple strings will each have their own NULL terminators. Thus, it automatically removes NULL terminators from the destination as it puts in a new source string. How convenient!

Security

There’s a problem, though. The strcat function works great when the programmer has done everything properly, but what happens when you have multiple people on a project and someone mistakenly decides to do something like this:

char msg1[] = “Testing, testing, ”;
 char msg2[] = “1, 2, 3...”;
 strcat(msg1, msg2);

At first glance, you may not notice anything wrong. But pay attention to how much space we’re allocating for msg1. Since C is counting it for us, we’ll get a total of 19 characters in memory to work with (including the NULL terminator). But the combined message will be quite a bit larger!

This is a simple example of a very common problem in C: the buffer overrun. On modern computers, this would ideally result in a crash in the worst case scenario. But on less secure systems, this can result in the overwriting of memory that doesn’t belong to your application! That is very bad, and is in fact how many security exploits in the past have worked.

Realizing that this was a real problem, the standards committee added a newer function to the C standard library:

char *strncat(char *dst, const char *src, size_t count);

The strncat function works exactly the same way as strcat, except it has an extra parameter for the number of characters to copy. While this still wouldn’t prevent a programmer from entering in the wrong value for ‘count’, it still goes a long way in preventing accidental buffer overruns. It guarantees that it will never copy more than ‘count’ number of characters from the source to the destination. These days there’s not much reason to use the regular strcat function over strncat, but of course it’s still useful in teaching how string concatenation works in C.

Closing Thoughts

Once you get into writing real-world applications in C, you’ll quickly find things like string concatenation to be extremely important. It’s unfortunate that it’s not as simple as saying “blah” + “ blah blah” as in higher-level languages, but at the same time you do have a lot more control over how your memory is used. There’s no magic or mystery; every byte is accounted for. Once you do have a solid C foundation, you can start moving onto C’s bigger brother and begin the process of gaining valuable C++ skills. Strings are also a little nicer in C++ as well!

There are a lot more string-related functions in C that you might find useful. String concatenation is a big one, but it’s definitely not the only one. You can learn more about these on your own, or you can complete some more challenging C problems at Udemy once you’ve become more comfortable with the language. Whichever route you choose, what’s most important is that you have fun!