How to Work with C Syntax: Learn the Basics

Article Summary

C syntax is the set of rules that gives the C programming language its structure, governing how keywords, identifiers, constants, string literals, and symbols must be written for code to compile. This article covers tokens, formatting conventions, syntax highlighting, commenting, and common mistakes. You'll gain a clear foundation for writing valid, readable C code.

Just as English has grammar, programming languages have syntax. Syntax is what gives a programming language its structure. Everything that you, the developer, type in has to be understood by the computer — the compiler. When the syntax is correct, the computer understands what you’re trying to do. When the syntax is incorrect, the computer becomes confused — it can’t determine how to compile your code.

The most basic component of the C programming language is the token; each individual token is used by the C compiler in a specific way. Tokens can be considered the basic building blocks of a C program, and understanding the different types of tokens will lead to a deeper understanding of how to create and maintain C code.

Recommended Udemy course

C Programming For Beginners

Huw Collingbourne

C as a language

C is one of the oldest languages still in common use today. First developed in 1973, it has a lower level of abstraction than modern languages, such as Java, C++, and C#. C, in conjunction with Java, is one of the first languages many programmers will learn.

With C, the trade-off is that though fewer processes (such as memory allocation and memory management) are automated, more is in direct control of the programmer’s hands. Programmers can develop their programming discipline by working with C and develop a greater level of understanding regarding the basic principles of programming.

As one of the most used languages in the world, C is well supported and well documented — not only are there many tutorials, lessons, bootcamps, and programming examples, but there is also a large development community. If you have any issues with syntax, you can usually ask a question in any development community and get a quick answer.

Source Code, Code, Programming, C, Coding, Digital

The basics of the C syntax

Before we discuss tokens, we should discuss basic C syntax. It’s important to know the following:

Every line is terminated by a semicolon.
White space is (outside of quotations) largely ignored.
There does need to be white space around keywords and variables.
Everything is case-sensitive in C.

When you have syntax errors, it will almost always be a misplaced semicolon or an accidental case issue. Many C interview questions will either explicitly focus on syntax or include syntax errors to catch.

Tokens in C

The fundamental building block of C as a language is the token. For us, a token is readable and understandable. For the computer, a token just stands for an operation that it’s supposed to perform.

In C, a token can be a:

Keyword
Identifier
Constant
String literal
Symbol

Each token is separately read by the compiler, and the compiler understands what to do with each token. If these tokens aren’t written or used correctly, the compiler error and the code won’t compile.

When programming in an IDE (Integrated Development Environment), the IDE will highlight different C tokens with different colors. Keywords may be bright blue, while constants may be red. IDEs make programming a lot easier. Many modern IDEs can also switch between syntax, so you can highlight C, Java, or C++.

It’s not usually necessary to memorize each token, but basic familiarity is a must. Once you understand basic tokens and how they work, you’ll be able to look up anything that’s more specialized.

Keywords in C

A keyword in C is a reserved word with special use within the syntax. You should never use a keyword as a variable name or function name because the compiler will fail to compile. There are many keywords in C, but here are a few of the most popular:

if/else: These keywords can start the “if” … “else” logic gate, which is a common method of controlling program flow. “If” can also stand alone.
break: This breaks the program out of the body of the code that it is currently in, effectively ending the method it is in.
return: This ends the function that it is in and often returns a value, such as “return 0” or “return 1.”
int: This defines an integer variable, otherwise known as a regular number.
char: This defines a character variable, either as a single character or a string array.

You will frequently use these keywords in C and will become extremely familiar with them. The most important thing to remember is that they should never be used elsewhere — you would not want to name a function “break” or an integer “return.”

Identifiers in C

An identifier is a user-defined string of text that’s used to define a function, variable, or pointer. Identifiers can contain letters A through Z, numbers 0 through 9, and underscores. They cannot include special characters such as question marks or exclamation points.

If you wanted to create an integer called “num” in C, you would do so like this:

int num;

In this case, “num” has become an identifier. In some ways, an identifier is just like a keyword you’re allowed to set yourself.

In C, most programmers use something called camel case. Variables will be named like this:

firstNumber
secondNumber
thirdNumber

An alternative to this is underscored, but this is less common in C programming and more common in languages such as PHP:

first_number
second_number
third_number

In fact, there’s no “right or wrong” way to create your identifiers, but you should always be consistent in your code. Otherwise, you’ll have to check every time you want to reference something.

In addition to being consistent, variables should always be descriptive. Good variables include:

username
password
email

Examples of bad variables would be:

thing
secondVariable
string

While general terms like “num” or “string” may be used in examples, they’re usually discouraged in actual code.

Constants in C

In C, there’s something called “scope.” A variable that is defined within a function cannot be accessed outside of that function. This is largely useful because it means that you can use similar variables inside of multiple functions (for instance, a “count” variable). But what happens when you need to reference a single variable throughout the program?

One solution would be to continually pass the variable to given functions — but that would be unwieldy quickly. C has another solution: constant variables, also known as literals. A constant can be any data type. Once defined, a constant will be accessible from anywhere in the code.

If you wanted to create a constant int called num, you would define it as such:

const int NUM;

You could also start it off with a value:

const int NUM = 0;

It’s general convention to always name constants in uppercase, so it’s easier to locate them when reading lines of code. Conversely, you shouldn’t name other variables in all uppercase because they will be confused for constants.

A very common example of a constant is DEBUG. Many programmers will set it as such:

const int DEBUG = 1;

In this situation, if the DEBUG constant is “true” (1), debug messages will print while the program runs. If the DEBUG constant is “false” (0), debug messages won’t print. Having a DEBUG constant makes it possible for the developer to turn debugging messages on and off throughout the program with a single toggle.

Constants should be used sparingly — not all variables should be constants. Rather, constants should be used for system-wide variables and settings.

Further, because constants are used throughout the program, they should always be declared at the beginning of the program — in the program’s header. This is an easy way of keeping track of all the constants that are used throughout the application. You should not declare a constant in the middle of your program.

String literals in C

A string literal is pretty much anything that is enclosed within double quotes in C. This can be data that is being stored, being sent to the console, or being printed across the monitor. An example of a string literal would be as follows:

printf(“This is a string literal.”);

String literals are special because they include format modifiers, or code that will be effectively replaced with another item. An example of a format modifier would be:

int num = 2;
printf(“The number is: %d”,num);

In the above situation, %d is just a placeholder for a variable of a certain type, and the type is num. The type of variable and the type of placeholder must match.

String literals have a number of placeholders that can be used with them. Another type of placeholder used in string literals includes “n,” which is a new line break, and “”, which is a termination.

A common mistake with string literals is using a single quote rather than a double quote, like so:

printf(‘This is incorrect.’);

When you use a single quote rather than a double quote, you will get a compiler error. The double quote is what denotes that a string literal will follow. (This can be confusing for programmers who are familiar with other languages, as there are other languages that require that a string literal use single quotes.)

String literals may also be used when assigning a string. If you were to create a string, you would write it as:

char greeting[12] = “Hello world”;

In this case, “Hello world” is a string literal.

Symbols in C

Symbols in C are usually either mathematical operators or relational operators.

Mathematical operations in C need to be performed on variables of the correct (numerical) type. They include:

a+b. Adding two variables together.
a-b. Subtracting two variables from each other.
a*b. Multiplying two variables by each other.
a/b. Dividing a variable by another variable.

In addition to this, parentheses can be used to prioritize operations, as with other types of mathematics. The code “a*(b+2)” will add b to 2 before multiplying the result.

Comparison symbols include:

a<b. Variable a is less than b.
a>b. Variable a is greater than b.
a<=b. Variable a is less than or equal to b.
a>=b. Variable a is greater than or equal to b.
a==b. Variable a is exactly equal to b.
a!=b. Variable a is not equal to b.

These comparisons will always return either true (1) or false (0). They are commonly used in if/then statements:

if (a==b) { 
	printf(“a is equal to b.”);
} else {
	printf(“a is not equal to b.”);
}

Alternatively:

if (1>2) {
	printf(“Something has gone horribly wrong.”);
} else {
	printf(“One is, of course, not more than two.”);
}

Symbols are rarely used elsewhere in C except for one notable exception. * can be used to multiply (a*b) as well as to denote a pointer (*b).

Formatting your C code

Formating code is, for a large part, developer-specific. The formatting of your code and the whitespace inside of it doesn’t matter to the C compiler. While some machine-level languages do care about whitespace, C does not. But that doesn’t mean there aren’t preferences.

There are two things that you need to consider when formatting your C code: curly brackets and spacing.

First, brackets. Conventionally, C code is written like this:

function (statements) {
		[code]...
}

This is to save space and improve readability. But some programmers prefer the following:

function (statements) 
{
	[code]...
}

This variant uses more space, but some programmers find it “ugly” and unnecessary. The major benefit to this type of code is that it makes it much clearer when a bracket is missing.

When it comes to spacing, the argument is largely between “tabs” or “spaces.” Neither matters in terms of C syntax. When indents are formed, you can either use a single tab or multiple spaces. Multiple spaces have the advantage of precision (three spaces will always be three spaces, whereas tabs vary in size), whereas tabs have the advantage of efficiency (only a single keypress).

Both brackets and spacing are up to the individual programmer. But again, as with naming conventions, it’s critical that you decide how you want to format your code early on and don’t deviate from it.

Using syntax highlighting for C

Most developers program in a specialized source code editor, such as Notepad++ or VI. Source code editors highlight code so that it’s easier to understand syntax and easier to catch mistakes. An example of a “Hello world” program might look like:

		int main() {
	    char string[12] = "Hello world";
        printf("%s",string);
        return 0;
    }

As you can see, it’s notably easy to scan for functions because the function printf() is in red. Likewise, it’s easy to see string literals because they are highlighted in bright green. While talented programmers can program completely in a simple text editor, syntax highlighting makes the process of debugging faster and easier.

Commenting your C code

Comments are remarks that developers can leave on their code that isn’t processed by the compiler but is visible within the source. Without comments, code can be very difficult to read. C allows developers to create comments in two ways:

/* This is a multi-line 
Comment */

//This is a single line comment.

Many developers use multi-line comments as a way to capture the attention of a reader regarding very important information.

/*************************************************************/
/* 		STOP! The constant variable below…*/
/***********************************************************/

Meanwhile, single line comments are usually used for in-line comments, or quick clarifications.

int count = 1; //This sets the limit for the next for() loop.

Thoroughly commenting code is important for readability. When it comes to comments, the compiler ignores the text entirely. While it bulks up the source code, it doesn’t add anything substantial to the program. You can also use comments to comment out sections of code that aren’t working or that you’re currently working on if you’re in the active process of debugging.

The best comments are specific. Your comments should explain both what is happening in the code and the expected results of the code. Part of writing code is making sure it’s going to be easy to maintain later.

Common mistakes with C syntax

C has a more rigid syntax than later languages. This is a double-edged sword. On the one hand, it’s easier to make a mistake in C. On the other, it’s harder to end up with unexpected behavior.

A few of the common mistakes with C syntax include:

Capitalizing the first letter of a function or keyword. C is case sensitive, so “For()” is not the same as “for().”
Forgetting to place a semicolon at the end of a line. Every statement needs a semicolon to end it.
Missing braces. When braces aren’t used properly, chunks of code aren’t closed off; this leads the program to think the code continues.
Not properly declaring variables. In C, every variable and pointer must be declared. This is different from some other languages, in which variables are automatically declared when used.

The C compiler is generally very good at both catching syntax errors and identifying where the syntax error occurred. Missing closing braces and missing closing parentheses tend to be the most enigmatic errors, as the compiler may not know where the “end” of the open block of code was supposed to be.

Digging deeper into C

When it comes to learning a new programming language, understanding syntax is half the journey. The fundamental logic behind programming doesn’t change, just the syntax and the libraries provided by the language. Even those who already have basic knowledge of C can be helped through a C programming course.

How to Work with C Syntax: Learn the Basics of Syntax in C