Businesses today require information and knowledge from large tables of data. Trend analysis, forecasting and intelligent business reporting dashboards are in demand. Strategic and tactical planners need information, and a database is full of it. So how can you transform that data into knowledge? You can look at the analytical functions in SQL, and the SUM function is one SQL function that helps you add up values.
SUM is one of the SQL functions that comprise the group called aggregate functions. Many of the functions we commonly use in SQL operate on only single rows. Aggregate (or group functions as they are also known), operate on multiple rows and enable data values to be manipulated and displayed differently then how they are stored in the database. For example, if a programmer uses functions that provide arithmetic operations such as SUM, COUNT, and AVG and then applies them to multiple rows of data, the result will be a value that is calculated but not actually stored in the database.
This is an important concept, because it means that data is not just stored in the database and then retrieved. Instead, group functions can apply mathematical operations to groups of rows and batches of data values and derive information previously unknown. The value of this is that programmers can generate further information and knowledge from the existing data just by applying SQL’s built in arithmetic functions.
As aggregate functions work on grouping data from rows or groups of rows, using GROUP BY and a multi-group function such as SUM can provide a summary total or individual group totals. Now that may not sound to earth shattering, but it is the first step in providing business analytics. For instance, a database can provide the business with answers such as what is the total salary overhead or per department and in total?
You can get answers to business questions such as the above by utilizing basic multi-group functions such as SUM and COUNT. Other single group functions provide the min and the max values, the variance, and from that the standard deviation, which will provide the basis for real business intelligence and analysis. A business needs more than static information. It needs analytics, so how can the database provide real-time answers to more intricate questions? What is the running total of sales, what are the sales rankings, what is the deviation in sales per channel from last week, last month, the month before, and this time last year?
These are the questions that businesses need answered and in real-time. To model those types of complex questions, and then implement them into standard SQL queries will be difficult to write. Thankfully, there are already SQL extensions that perform these specialist analytic functions.
SUM as an Analytic Function
The SUM function has analytic extensions to its standard syntax;
Syntax: SUM (expression, or column)
OVER (PARTITION, ORDER BY, WINDOW )
As can be seen from the syntax in the example, the extension OVER is added to the SUM calculation. This is added to indicate that, in this case, SUM is working as an analytic function, and the ORDER BY indicates the data range (window or rows) which this analytical sum function will be performed over.
For example, in the code below SUM will be used as an analytic extension over the data in the pre_order column and ordered by customer_name
select customer_name, pre_order,
OVER (ORDER BY customer_name) running_total
order by customer_name;
The SQL query above uses the SUM analytic function to perform a cumulative count of pre_order values row-by-row for each customer.
The results show a list of all customer pre-orders and the cumulative running orders for all customers.
However, if the business requirement is to see all pre-orders per customer and cumulative pre-order value per department, then the group by or partition clause is required to segregate each group and restrict the SUM analytic function to each group of data (department).
|select company_name, department_id, pre_orders,SUM (pre_orders)OVER (PARTITION BY department_id ORDER BY pre_orders) department_totalfrom Sales_Orders|
order by department_id, pre-orders, company_name;
In this SQL query example, the function name is SUM and it is applied to the column pre-orders, which is the basic syntax. The OVER identifies the function SUM as being called as an analytic function, which will perform over the data from the (PARTITION BY department_id ORDER BY pre_sales).
Setting the Range
An analytic function may or may not have a window clause. A window is basically the range of the group of rows within a partition, which are to be evaluated by the analytic function. In the first example, a window is used by default, as it uses an ORDER BY clause. An ORDER BY clause effectively adds a default windows clause, which states that the current and previous rows in the current partition are the rows that should be used in the calculation. When a PARTITION is not stated, then the entire set of rows will be the default partition.
In example one, there was no PARTITION coded so the whole column was used by the cumulative SUM function. However, in example two, there was a PARTITION, which grouped per department, however there were no windowing parameters set. Window parameters can be set using the ROWS or RANGE PRECEEDING clause.
SQL analytic functions are powerful ways to get answers from the data contained in the database. Analytics is the functionality required for detailed reporting, and the source of information for operational and tactical planning and analysis. If you are on a journey towards becoming a SQL guru or business analyst then learning to utilize the analytical functions is an essential part of that tactical vision.