SQL SUM: What Does It Do and How Do You Use It?
Businesses today generate useful information from increasingly large data sets, which is why trend analysis, forecasting, and intelligent business reporting dashboards are in high demand. Strategic and tactical planners need this information to make decisions about the future of their businesses, which also causes a reliance on the relational databases that they use to store the data.
So how can you transform the data stored in the tables of a database into this type of knowledge? The analytical functions in SQL allow you to run calculations on the records in a database. Many consider the SUM function as one of the most useful SQL commands, as it helps you add the values in a column or the results from another SQL expression. In this article, we will look at a few of the ways you can SUM.
What Does SQL SUM Do?
SUM is one of the SQL functions that comprise the group called aggregate functions. Many of the functions we commonly use in SQL operate on only single rows. Aggregate functions, or group functions, operate on multiple rows and allow users to manipulate and display data values differently than how they appear in the database. For example, if a programmer uses functions that provide arithmetic operations such as SUM, COUNT, and AVG and then applies them to multiple rows of data, the result will be a value that will calculate but not actually store in the database.
This is an important concept because it means that you don’t simply store and retrieve data from the database. Instead, group functions can apply mathematical operations to groups of rows and batches of data values and derive information previously unknown. The value of this is that programmers can generate information and knowledge from the existing data just by applying SQL’s built-in arithmetic functions.
Aggregate functions work on data that you group using GROUP BY. A multi-group function such as SUM can provide a summary total of the entire dataset or individual group totals. Now that may not sound earth-shattering, but it is the first step in providing business analytics. For instance, a database can provide the business with answers, such as, “what is the total salary overhead or per department.”
You can get answers to business questions such as the above by using basic multi-group functions such as SUM and COUNT. Other single group functions provide the MIN and the MAX values and the variance. From those values, you can calculate the standard deviation, which is the basis for real business intelligence and analysis. A business needs more than static information. It needs analytics. So how can the database provide real-time answers to more detailed questions? What is the running total of sales, what are the sales rankings, what is the deviation in sales per channel from last week, last month, the month before, and this time last year? Those are the questions that businesses need answers to in real-time.
To model those types of complex questions and then implement them into standard SQL queries would take many steps to write. Thankfully, there are already SQL extensions that perform these analytic functions. Using the SQL SUM function is also more efficient than querying the data in a programming language and looping the results to generate the sum. Doing so will also require returning more data to your application when all it needs is a simple value. So let’s look at the syntax of the SUM function.
Syntax of SQL SUM
Here is the basic syntax of the SUM statement:
SELECT SUM(column_name or expression) FROM table_name WHERE condition;
The column must be a numeric column, and if it contains an SQL expression, it must return a list of numeric values. That is the basic syntax, but there are more ways you can use SUM in your queries. One way is using SUM with OVER.
SELECT column_1, column_2, SUM (column_2) OVER (ORDER BY column_1) running_total FROM table_name ORDER BY column_1;
The extension OVER is added to the SUM calculation in the query above. In this case, SUM is working as an analytic function, and the ORDER BY clause indicates the data range (window or rows) that this analytical sum function will be performed on. This query will return three columns. The first column tells you the order in which the data will process, while the second shows you the data that it will sum together. In the third, you’ll find the running total of the second column.
Another SQL statement that can be useful to use with SUM is the GROUP BY statement. This will allow us to sum the values of one column based on the value of another column. Here is the syntax used for that type of query:
SELECT column_1, SUM (column_2) column_one_sum FROM table_name GROUP BY column_1;
Now that you have an idea of how you can use SUM, let’s look at some example data, queries, and results to see how SUM works on real data.
Examples of Using SUM in SQL
For the examples, we are going to assume we have a database table called orders that has data like that below.
Now let’s look at the variety of ways you can use SUM in your SQL queries. It is really flexible.
Basic Use of SQL SUM
So the most basic query using SUM for the data we see above would be this one:
SELECT SUM(pre_order) total FROM orders;
This would return one row of results.
Using SUM with OVER
If we use the OVER statement in our query to generate a running total of the pre_order column along with customer_name, it will look like this.
SELECT customer_name, pre_order, SUM (pre_order) OVER (ORDER BY customer_name) running_total FROM orders ORDER BY customer_name;
This query would return these results for our table:
This query returned a much more useful data set. We have the customer’s name now and a running total of pre-orders.
Using SUM with OVER and PARTITION BY
However, if we want to see all pre-orders per customer and the running pre-order total per department, then the GROUP BY or PARTITION clause is necessary to segregate each group and restrict the SUM function to each group of data (department).
SELECT company_name, department_id, pre_order, SUM(pre_order) OVER (PARTITION BY department_id ORDER BY pre_order) department_total FROM orders ORDER BY department_id, pre-orders, company_name;
In this SQL query example, the function name is SUM, and it applies to the column pre-orders, which is the basic syntax. The OVER tells the database that you want to use SUM as an analytic function, which will perform OVER the data that results from this SQL expression:
PARTITION BY department_id ORDER BY pre_sales
The result of the query is below.
Now we have a running total per department.
Using SUM with GROUP BY
If we wanted only the sum of the departments, we could use GROUP BY in our SQL and write a basic query.
SELECT department_id, SUM(pre_order) department_total FROM orders GROUP BY department_id;
The results we would get from that query would look like this:
Using SUM with DISTINCT
You can also use DISTINCT inside your SUM function. The result is that it will remove all the duplicate values of the column. Here is our first simple query that summed the value of the pre_order column in every record with the DISTINCT clause added:
SELECT SUM(DISTINCT pre_order) distinct_total FROM orders;
Add here is the result from that query:
Notice that the value returned by the query without the DISTINCT clause was 211,000, but now with the DISTINCT clause, it is 131,000 or 80,000 less. That is because 40,000 is duplicated once in the pre_order column, and 20,000 is replicated twice for a total of 80,000.
Using SUM to Calculate Group Totals on More Than One Column
You may have noticed that the orders table has another numeric field. Let’s say that we not only want to calculate the sum of a department’s pre-orders but also the sum of the percentage column. We can use the query below on the orders table to do this.
SELECT department_id, SUM(pre_order) department_order_total, SUM(expense) department_expense_total FROM orders GROUP BY department_id;
This query will return this result:
Using an Expression In a SUM Function
We did say that the value used in a SUM function could be a column or an SQL expression. If we wanted a running total of orders with expenses subtracted, we could do that with an expression. I don’t know if this query is useful, but it is possible, and it shows how we can use an expression with SUM. Here is the query:
SELECT customer_name, pre_order, SUM (pre_order - expense) OVER (ORDER BY customer_name) running_total FROM orders ORDER BY customer_name;
The result of that query would be the following:
Using SQL SUM with SQL COUNT and SQL AVG
You can also combine aggregate functions in your SQL queries to act on the same set of records. If we wanted to return the count of customers in a department along with the sum of the pre-orders and the sum of expenses, we could do that. Here is that query:
SELECT department_id, SUM(pre_order) department_order_total, SUM(expense) department_expense_total, COUNT(customer_name) customer_count FROM orders GROUP BY department_id;
The results of that query would look like this:
What that means is that we have the numbers we can use to calculate averages. That means we can now write a query like this:
SELECT department_id, SUM(pre_order) department_order_total, SUM(expense) department_expense_total, COUNT(customer_name) count, SUM(orders) / COUNT(customer_name) avg_orders, SUM(expense) / COUNT(customer_name) avg_expenses FROM orders GROUP BY department_id;
Which will return the following record set:
But we don’t have to do that because we can also AVG along with COUNT for a simpler query. The following query will return the same results as the last query.
SELECT department_id, SUM(pre_order) department_order_total, SUM(expense) department_expense_total, COUNT(customer_name) count, AVG(orders) avg_orders, AVG(expense) avg_expenses FROM orders GROUP BY department_id;
Setting the Range for SUM
An analytic function may or may not have a window clause — a window in this context being the range of rows that the analytic function uses. The first query that just returned the sum of the pre_order column did not use a window clause. In this case, it will add the entire set of rows.
The second example that returned a running total of every customer used a window by default. The ORDER BY clause we used triggers this default window. This means the current and previous rows in the current partition are the rows to be used in the calculation. Since there was no PARTITION coded in this example, the cumulative SUM function used the whole column.
In the third example, we returned a running total per department. In this example, there was a PARTITION, which we grouped by department. However, we set no windowing parameters. You can set window parameters using the ROWS or RANGE PRECEEDING clause.
Common SQL SUM Errors
One of the most common errors is when you use the SUM function results that aren’t numeric data. Sometimes you may have numeric data in a VARCHAR column. You will first have to cast these values to a numeric data type before you use the SUM function on them.
Another error that can happen when you are using the SUM function is a numeric overflow. It will look like this:
Arithmetic overflow error converting expression to data type int.
Top courses in Development
It may happen with a query that has worked for ages, and then one day, it throws an error. This can occur when the column is an integer type and the result of the SUM function is larger than the limits of that type. The SUM function will try to return the result in the same type that columns are in. To fix this error, you must first convert the values to a big integer. Here is an example:
SELECT department_id, SUM(CONVERT(BIGINT, pre_order)) department_total FROM orders GROUP BY department_id;
Another common error you may run into using the SUM function with the GROUP BY clause is the following:
Column is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause.
This usually means you have not set up the window for the SUM function. A query that could trigger this type of error in some relational databases would be this one:
SELECT *, SUM(pre_order) total FROM orders;
There are a few ways to fix this query, depending on what you were trying to do with the first query. If you just want a complete sum of the pre_order column, you could just remove the asterisk from the query.
SELECT SUM(pre_order) total FROM orders;
If you are summing based on another column, you could add a GROUP BY. That query would look like this:
SELECT department_id, SUM(pre_order) total FROM orders GROUP BY department_id;
You can also fix the query by using the OVER function to tell the database that your window is the entire table. That query would look like this:
SELECT SUM(pre_order) OVER() total FROM orders;
Analytics is the functionality required for detailed reporting and the source of information for operational and tactical planning and analysis. The SUM function is a powerful analytic command that will add the values of a table column or the result of an SQL expression. When used with the ORDER BY statement, the OVER statement, GROUP BY statement, and PARTITION BY statement, you can use SUM to calculate sums across varying ranges of rows.
Top courses in SQL
SQL students also learn
Empower your team. Lead the industry.
Get a subscription to a library of online courses and digital learning tools for your organization with Udemy for Business.