What is SQL? An Introduction to the Industry’s Favorite Database Language
What is SQL?
SQL (pronounced either “es-cue-el” or “sequel”), which stands for Structured Query Language, is a versatile, open-source query language. It is used to write queries for managing data in structured, relational databases. Some call SQL the “lingua franca” of database management for its versatility, simplicity, and ability to organize and manage large amounts of data in a standardized, tabular format.
Despite there being many alternatives, SQL remains the world’s most widely used database language. SQL forms the basis for next-generation database managers such as Oracle, MySQL, PostgreSQL and Microsoft T-SQL. For these reasons, SQL skills continue to be in high demand across almost all industries. It can be a key technology for developers, data architects, and data analysts. Even if they use other programs like Excel for data analysis, SQL skills are useful to retrieve the data from databases.
Last Updated December 2023
From Tables and SELECT queries to advanced SQL. SQL Server 2012, 2014, 2016, 2017, 2019, and 2022 | By Phillip Burton, I Do Data LimitedExplore Course
SQL: The standard for relational databases
Since the early days of the Internet, SQL has been the world’s standard query language for creating and managing relational databases. These are data management systems built on a tabular format containing rows (or records) and columns (or fields) of individual data entries.
SQL is a highly versatile programming language for managing databases and creating the schemas (structure) for structured databases. These databases consist of tables containing data, views, stored procedures containing SQL statements, and more. You can also use primary and foreign keys to create relationships among different tables. It can also be very scalable. Azure SQL Database has a hyperscale tier, which allows the data to increase up to 100 terabytes.
These features make structured relational data models both powerful and versatile. They work across multiple platforms and operating systems, with predictable and reliable operations. This makes it easy for websites to retrieve and display a site and for search engines to recognize and understand the content.
The history of SQL
Structured data models are the most traditional form of data storage. They have been in use continuously since the creation of the earliest versions of database management systems. Since SQL is the query language behind most of those systems, it remains at the core of structured database management even today.
IBM’s Edgar F. Codd defined the relational database model in 1969. Built on a system of “keys,” or common information among different pieces of data, this model led to IBM’s development of its SQL dialect. IBM tested the earliest version of SQL in 1978. Within a few years, this relatively simple language based on familiar English commands became the basis for a number of SQL-based products designed for relational database management.
In the mid-1980s, the American National Standards Institute (ANSI) and the International Organization for Standardization (ISO) recognized SQL as the standard query language for managing any relational database management system, or RDBMS. They established standards for SQL providers to base their dialects on. These two standards have developed over the last few decades.
Different companies have created a variety of proprietary SQL programs. They have not stuck to the SQL standards, but have augmented its functionality. These dialects have similarly evolved over the decades, taking some of the new features in the Standard SQL and introducing additional features.
The two most popular dialects are both owned by Oracle. Oracle SQL is Oracle’s own version of SQL, which is expanded with the features in PL/SQL such as conditions and loops. MySQL is free and can manage SQL databases for a variety of purposes. It was originally created by Sun Microsystems, which Oracle bought in 2010.
PostgreSQL is an advanced database management system operating on an advanced version of SQL. It has a procedural language of its own, called PL/pgSQL. Like other dialects, it adds its own syntax and a range of user-defined functions and query types.
Transact-SQL (or TSQL) from Microsoft and Sybase includes a proprietary group of programming extensions adding new features to SQL. T SQL functions include not only modifications of standard SQL queries and statements but also a variety of user-defined functions and triggers. For more information, have a look at my Udemy article “SQL vs. T-SQL: How Do They Compare?”
How do spreadsheets such as Excel compare to SQL?
In a relational model, data is stored in databases in tables that have rows (or records) and columns (fields). In that way, relational database management systems are similar in structure and function to spreadsheet applications like Microsoft Excel, which organize data in workbooks, spreadsheets, rows, and columns.
Excel can be efficient and easy to use for managing smaller data sets for single users. However, it lacks the scalability and speed for handling larger datasets or allowing multiple users to use it at the same time. Databases can manage far more structured data than standard spreadsheets can. They can scale to handle billions of rows in multiple tables that can connect through sets of common fields. Thousands or millions of users can access this data securely at the same time. For more differences, please see my Udemy article about the differences between Excel and SQL Server.
If you are an Excel expert, then the good news is that you do not have to choose between Excel or SQL Server — you can use them with each other. For more about how they can work together, please see my Udemy article “How to Use SQL in Excel.”
What are the alternatives to traditional database systems?
For years, structured data was the standard form for data storage, and SQL the standard querying language. With the advent of Web 2.0 and the rise of the Internet of Things, unstructured data from a multitude of internet apps and billions of constantly connected devices began to challenge the structured data model.
Unstructured data or semi-structured data is a broad term for information that can’t be organized according to a predefined data model. Unstructured data includes documents, audio, image, and video files. Semi-structured data includes data that has a more defined data structure, but not necessarily one which is constant. This includes formats such as XML, JSON, and data that can be stored in Azure Cosmos DB, MongoDB, Cassandra, and graph format.
For more about different types of database structures, have a look at my course “DP-900: Microsoft Azure Data Fundamentals.” In addition to relational data, this course looks at non-relational data on Azure, including Cosmos DB APIs, Table, Blob and File storage, together with analytics workloads and data visualization in Microsoft Power BI.
Why should you learn SQL?
Even with the proliferation of unstructured or semi-structured data, SQL remains the standard for querying databases. Even if a company uses another type of database management system, it will likely involve SQL syntax or one of its derivatives in some way.
That means demand remains high for professionals with a knowledge of SQL, and salaries for positions involving SQL skills can be higher than the average. A solid understanding of SQL can be valuable for a variety of positions, including:
- Database administration (DBA). This is a job dedicated to ensuring the correct storage and management of an organization’s data. That might involve managing multiple databases and server configurations for specific purposes within the organization.
- Data analysis and data science. This is a field related to processing large amounts of data and extracting insights for use in making decisions and formulating company policies. Data science is a key part of an organization’s IT team, and recent statistics reveal that SQL remains more popular among these data specialists than other languages such as Python.
- Database migration. This is a data management specialty that involves migrating data from various databases to an SQL server, either on-premises or in the cloud, such as Microsoft Azure. Data migration engineers should be familiar with SQL as well as other programming languages or tools used to manage existing databases.
- Data architecture. This field involves building products or constructing frameworks for managing large volumes of data from a variety of sources. This might also include creating databases and integrating them into multiple applications and websites.
- Web development. This field requires knowledge of database creation and management in relation to building and maintaining websites that incorporate SQL databases. Developers typically have skills in a number of programming languages such as C#, and SQL is a useful addition to the list.
Adding SQL knowledge to your resume or CV, in addition to other data-related skills and languages, can enhance your job opportunities across all kinds of industries needing database management or querying data in their daily operations.
Is it easy to learn SQL?
SQL’s straightforward and relatively simple syntax makes it easier to learn than some other programming languages. Mastering SQL involves learning its syntax and structure and practical exercises on writing queries and creating model databases.
SQL has an English-based query structure, so you will encounter familiar words like SELECT, INSERT, and DELETE. It is possible for beginners with little experience to learn the basics of SQL relatively quickly. You may need more time to learn more complex features and how to integrate SQL with other tools and software.
How can you use SQL?
SQL can execute a range of essential database management and querying functions, including:
Executing queries and retrieving data from the database
- The SELECT clause shows what fields you want to extract or aggregate (SUM, COUNT, MIN, MAX) and whether they should have new names (aliases).
- The FROM clause refers to the source data. This can also include JOINs (see below).
- The WHERE clause restricts the number of rows. For example, you may wish to only return rows where a person’s address is in New York.
- Where aggregations are used in the SELECT clause, the GROUP BY clause shows what they should aggregate against. For example, if you are calculating the total hours expended, the GROUP BY clause could show that it should calculate per day, per employee, or per department.
- The HAVING clause can restrict the number of rows following the GROUP BY. For example, if you only want the rows where the total number of hours is greater than 100. This is the least used principal clause.
- The ORDER BY clause shows how to sort the data and whether it should be ascending or descending.
A valid SELECT statement could therefore be:
SELECT substring(name,1,1), count(*) as NumberOfRows
WHERE name LIKE '%f%'
GROUP BY substring(name,1,1)
ORDER BY NumberOfRows ASC, substring(name,1,1) DESC;
It is not necessary to use all six principal clauses in the same statement. All SQL dialects allow valid SELECT statements using just the SELECT and FROM clauses. It is possible in SQL Server for valid SELECT statements to use only the SELECT clause.
Combining tables together
There are two ways to do this: either add new rows or new columns to an existing data source.
You can combine or contrast rows from datasets by using UNION, UNION ALL, INTERSECT and EXCEPT (known as MINUS in some dialects).
Here is an example of a UNION:
This combines the two tables tblEmployeePart1 and tblEmployeeNumber2. They will need to have the same number of columns and compatible data types.
However, the main way to combine tables is usually by adding data from one table into additional columns in another table. You can do this using JOINs, which are part of the FROM clause.
There are several types of JOINs:
- INNER JOINs return only those records or rows that have the same values in key columns. If values from these key columns exist in one table but not the other, they are not returned.
- OUTER LEFT JOINs and OUTER RIGHT JOINs, normally known as LEFT and RIGHT JOINs, return the entirety of one dataset, together with relevant data from the second dataset that match.
- FULL OUTER JOINs return the data from both datasets, combined where appropriate.
This is an example of a LEFT JOIN:
FROM tblTransaction AS T
LEFT JOIN tblEmployee AS E
ON T.EmployeeNumber = E.EmployeeNumber
WHERE E.EmployeeNumber IS NULL;
The above shows all Transactions for all Employees who do not have a matching entry in the Employee table.
Managing data within the database, such as inserting, deleting, or updating individual records
INSERT adds additional records or rows, either using the VALUES keyword to add hard-coded rows or, by using a SELECT statement, incorporating them from another table. The following inserts an Employee’s data into the Employee table:
INSERT INTO tblEmployee (EmployeeNumber, EmployeeFirstName, EmployeeMiddleName, EmployeeLastName)
VALUES (2, 'Jane', NULL, 'Zwilling');
DELETE can delete all of the records or rows in a table. It is usually used with WHERE to limit the deletion to rows that fulfill particular criteria. The following deletes EmployeeNumber 5, if it exists, from the tblEmployee table.
DELETE FROM tblEmployee
WHERE EmployeeNumber = 5;
UPDATE modifies existing rows or records in a table. It could change the status of a set of records from “On leave” to “Active,” for example.
SET Status = 'Active'
WHERE EmployeeNumber = 5;
MERGE combines one table with another table and can take different actions depending on whether key parts of the row are already included in the table or not. The following updates the Amount column for existing transactions:
MERGE INTO tblTransaction as T
USING tblTransactionNew as S
ON T.EmployeeNumber = S.EmployeeNumber AND T.DateOfTransaction = S.DateOfTransaction
WHEN MATCHED THEN
UPDATE SET Amount = T.Amount + S.Amount;
Creating new tables and other objects within a structured database
The keywords CREATE, ALTER, and DROP (which deletes the object) do this.
When defining tables using CREATE TABLE, you also define the columns, the data type, and any constraints (restrictions) that should be imposed. Examples of these constraints include:
- Primary Key. This column (or columns) uniquely identifies that row in the table. For example, if the Primary Key in an Employee table was EmployeeNumber, then there would only be one EmployeeNumber 5 allowed. You can only have one Primary Key per table.
- Foreign Key. This is a reference to a Primary Key in other tables. For example, in a Transaction Table, you might have transactions made by EmployeeNumber 5. You could then create a Primary Key in the Transaction Table, referring to the Employee table. This would mean that you would not be able to add transactions by employees who were not in the Employee table.
- NULL or NOT NULL. NULL is the absence of data. If a column was “NOT NULL” then you would have to add data into that column when you create a new row or update an existing row.
- DEFAULT. This is a default value or formula if the data is NULL for that row. For example, I often have a “DateWhenInserted” column. Instead of adding the values for this column myself, I use a DEFAULT formula of GETDATE() so that it inserts the current date and time.
- CHECK constraint. You may wish to restrict the types of data that can be entered. For example, in a NumberOfHoursWorked column, you may wish to stop negative values, or values above 100 hours per week, from being entered.
An example of a new table is:
CREATE TABLE Lockers
(LockerNumber INT PRIMARY KEY,
Location varchar(20) NULL,
NumberOfTimesUsed INT CHECK (NumberOfTimesUsed>=0),
DateWhenInserted datetime DEFAULT (GETDATE()));
This creates a table called Lockers with four columns. The column “LockerNumber” is the Primary Key, and other tables may reference it. The column “Location” might be NULL (blank). The column “NumberOfTimesUsed” cannot be negative. And the column DateWhenInserted will default to the current date and time unless I specify another value.
How long does it take to learn it?
The good news is that you can learn the basics of SQL very quickly. In fact, you can learn the six principal clauses of the SELECT statement in an hour. This will allow you to retrieve data from databases such as Microsoft SQL Server. If this appeals to you, then please have a look at my course SQL Server Essentials in an hour: The SELECT statement.
If you would like a more in-depth look at SQL, then why not have a look at my eight-hour “Database Fundamentals” course? We’ll look at the SELECT statement, normalisation, constraints, and inserting, updating and deleting data. Additionally, we’ll investigate views, procedures, functions, triggers, indexes, security, and backing up data. With 15 quizzes and a Practice Test, you can be sure that you are learning.
However, if you want more, then have a look at my 29-hour “Querying Microsoft SQL Server with Transact-SQL.” In this, we have a real in-depth look at various aspects of querying data, such as those mentioned above, plus data types, combining datasets, ranking functions, subqueries, XML, JSON, transactions, and Dynamic Management Views.
For more than half a century, SQL has been the industry standard for relational database management. Today, it still tops most other popular programming languages as a desired skill for any data analyst, engineer, or architect. Although the advent of Big Data and the Internet of Things has boosted the rise of other kinds of data management systems, SQL remains an essential tool for data professionals in all sorts of organizations.
I hope you will expand your toolkit with learning SQL commands, including the SELECT statement. I’ll see you in one of my courses!
Top courses in SQL
SQL students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy Business.