Aggregate Functions in SQL

Functions in SQL enable we to perform feats such as determining the sum of a column or converting all the characters of a string to uppercase.

By the end of the day, we will understand and be able to use all the following:

  • Date and time functions

  • Arithmetic functions

  • Character functions

  • Aggregate functions

  • Conversion functions

  • General Functions

  • Miscellaneous functions

These functions greatly increase our ability to manipulate the information we retrieved using the basic functions of SQL. The first five aggregate functions, COUNT, SUM, AVG, MAX, and MIN, are defined in the ANSI standard. Most implementations of SQL have extensions to these aggregate functions, some of which are covered today. Some implementations may use different names for these functions.

Aggregate Functions

These functions are also referred to as group functions. They return a value based on the values in a column. (After all, we wouldn’t ask for the average of a single field.) The examples in this section use the table TEAMSTATS:

SQL> SELECT * FROM TEAMSTATS;
NAMEPOSABHITSWALKSSINGLESDOUBLESTRIPLESHRSO
JONES1B14545343181510
DONKNOW3B175652350101415
WORLEYLF15749153583316
DAVIDOF187702448401742
HAMHOCKER3B5012101020013
CASEYDH10000001

6 rows selected.

COUNT

The function COUNT returns the number of rows that satisfy the condition in the WHERE clause. Say we wanted to know how many ball players were hitting under 350. We would type

SQL> SELECT COUNT(*) FROM TEAMSTATS WHERE HITS/AB < .35;
COUNT(*)
--------
4

To make the code more readable, try an alias:

SQL> SELECT COUNT(*) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35;
NUM_BELOW_350
-------------
4

Would it make any difference if we tried a column name instead of the asterisk? (Notice the use of parentheses around the column names.) Try this:

SQL> SELECT COUNT(NAME) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35;
NUM_BELOW_350
-------------
4

The answer is no. The NAME column that we selected was not involved in the WHERE statement. If we use COUNT without a WHERE clause, it returns the number of records in the table.

SQL> SELECT COUNT(*) FROM TEAMSTATS;
COUNT(*)
---------
6

SUM

SUM does just that. It returns the sum of all values in a column. To find out how many singles have been hit, type

SQL> SELECT SUM(SINGLES) TOTAL_SINGLES FROM TEAMSTATS;
TOTAL_SINGLES
-------------
174

To get several sums, use

SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES,
SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS;
TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR
------------- ------------- ------------- --------
174            32             5       29

To collect similar information on all 300 or better players, type

SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES,
SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS WHERE HITS/AB
>- .300;
TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR
------------- ------------- ------------- --------
164            30             5       29

To compute a team batting average, type

SQL> SELECT SUM(HITS)/SUM(AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.33706294

SUM works only with numbers. If we try it on a non-numerical field, we get

SQL> SELECT SUM(NAME)FROM TEAMSTATS;
***### ERROR:
ORA-01722: invalid number
no rows selected

This error message is logical because we cannot sum a group of names.

AVG

The AVG function computes the average of a column. To find the average number of strike outs, use this:

SQL> SELECT AVG(SO) AVE_STRIKE_OUTS FROM TEAMSTATS;
AVE_STRIKE_OUTS
---------------
16.166667

The following example illustrates the difference between SUM and AVG:

SQL> SELECT AVG(HITS/AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.26803448

Analysis:

The team was batting over 300 in the previous example! What happened? AVG computed the average of the combined column hits divided by at bats, whereas the example with SUM divided the total number of hits by the number of at bats. For example, player A gets 50 hits in 100 at bats for a .500 average. Player B gets 0 hits in 1 at bat for a 0.0 average. The average of 0.0 and 0.5 is .250. If we compute the combined average of 50 hits in 101 at bats, the answer is a respectable .495. The following statement returns the correct batting average:

SQL> SELECT AVG(HITS)/AVG(AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.33706294

Like the SUM function, AVG works only with numbers.

MAX

If we want to find the largest value in a column, use MAX. For example, what is the highest number of hits?

SQL> SELECT MAX(HITS)FROM TEAMSTATS;
MAX(HITS)
---------
70

Can we find out who has the most hits?

SQL> SELECT NAME FROM TEAMSTATS WHERE HITS = MAX(HITS);

***### ERROR at line 3:
ORA-00934: group function is not allowed here

Unfortunately, we can’t. The error message is a reminder that this group function ***### (remember that aggregate functions are also called group functions) doesn’t work in WHERE clause. What happens if we try a non-numerical column?

SQL> SELECT MAX(NAME)FROM TEAMSTATS;
MAX(NAME)
---------------
WORLEY

Here’s something new. MAX returns the highest (closest to Z) string. Finally, a function works with both characters and numbers.

MIN

MIN returns the lowest member of a column. To find out the fewest at bats, type

SQL> SELECT MIN(AB)FROM TEAMSTATS;
MIN(AB)
---------
1

The following statement returns the name closest to the beginning of the alphabet:

SQL> SELECT MIN(NAME)FROM TEAMSTATS;
MIN(NAME)
---------------
CASEY

We can combine MIN with MAX to give a range of values. For example:

SQL> SELECT MIN(AB), MAX(AB) FROM TEAMSTATS;
MIN(AB)  MAX(AB)
-------- --------
1      187

This sort of information can be useful when using statistical functions. NOTE: As we mentioned in the introduction, the first five aggregate functions are described in the ANSI standard. The remaining aggregate functions have become de facto standards, present in all important implementations of SQL. We use the Oracle7 names for these functions. Other implementations may use different names.

VARIANCE

VARIANCE produces square of standard deviation, a number vital to many statistical calculations. It works like this:

SQL> SELECT VARIANCE(HITS)FROM TEAMSTATS;
VARIANCE(HITS)
--------------
802.96667

If we try a string

SQL> SELECT VARIANCE(NAME)FROM TEAMSTATS;
ERROR:
ORA-01722: invalid number
no rows selected

We find that VARIANCE is another function that works exclusively with numbers.

sql rdbms

Subscribe For More Content