Aggregate Functions in SQL
Functions in SQL enable we to perform feats such as determining the sum of a column or converting all the characters of a string to uppercase.
On this page
By the end of the day, we will understand and be able to use all the following:
Date and time functions
Arithmetic functions
Character functions
Aggregate functions
Conversion functions
General Functions
Miscellaneous functions
These functions greatly increase our ability to manipulate the information we retrieved using the basic functions of SQL. The first five aggregate functions, COUNT, SUM, AVG, MAX, and MIN, are defined in the ANSI standard. Most implementations of SQL have extensions to these aggregate functions, some of which are covered today. Some implementations may use different names for these functions.
Aggregate Functions
These functions are also referred to as group functions. They return a value based on the values in a column. (After all, we wouldn’t ask for the average of a single field.) The examples in this section use the table TEAMSTATS:
SQL> SELECT * FROM TEAMSTATS;
NAME | POS | AB | HITS | WALKS | SINGLES | DOUBLES | TRIPLES | HR | SO |
---|---|---|---|---|---|---|---|---|---|
JONES | 1B | 145 | 45 | 34 | 31 | 8 | 1 | 5 | 10 |
DONKNOW | 3B | 175 | 65 | 23 | 50 | 10 | 1 | 4 | 15 |
WORLEY | LF | 157 | 49 | 15 | 35 | 8 | 3 | 3 | 16 |
DAVID | OF | 187 | 70 | 24 | 48 | 4 | 0 | 17 | 42 |
HAMHOCKER | 3B | 50 | 12 | 10 | 10 | 2 | 0 | 0 | 13 |
CASEY | DH | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1 |
6 rows selected.
COUNT
The function COUNT returns the number of rows that satisfy the condition in the WHERE clause. Say we wanted to know how many ball players were hitting under 350. We would type
SQL> SELECT COUNT(*) FROM TEAMSTATS WHERE HITS/AB < .35;
COUNT(*)
--------
4
To make the code more readable, try an alias:
SQL> SELECT COUNT(*) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35;
NUM_BELOW_350
-------------
4
Would it make any difference if we tried a column name instead of the asterisk? (Notice the use of parentheses around the column names.) Try this:
SQL> SELECT COUNT(NAME) NUM_BELOW_350 FROM TEAMSTATS WHERE HITS/AB < .35;
NUM_BELOW_350
-------------
4
The answer is no. The NAME column that we selected was not involved in the WHERE statement. If we use COUNT without a WHERE clause, it returns the number of records in the table.
SQL> SELECT COUNT(*) FROM TEAMSTATS;
COUNT(*)
---------
6
SUM
SUM does just that. It returns the sum of all values in a column. To find out how many singles have been hit, type
SQL> SELECT SUM(SINGLES) TOTAL_SINGLES FROM TEAMSTATS;
TOTAL_SINGLES
-------------
174
To get several sums, use
SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES,
SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS;
TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR
------------- ------------- ------------- --------
174 32 5 29
To collect similar information on all 300 or better players, type
SQL> SELECT SUM(SINGLES) TOTAL_SINGLES, SUM(DOUBLES) TOTAL_DOUBLES,
SUM(TRIPLES) TOTAL_TRIPLES, SUM(HR) TOTAL_HR FROM TEAMSTATS WHERE HITS/AB
>- .300;
TOTAL_SINGLES TOTAL_DOUBLES TOTAL_TRIPLES TOTAL_HR
------------- ------------- ------------- --------
164 30 5 29
To compute a team batting average, type
SQL> SELECT SUM(HITS)/SUM(AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.33706294
SUM works only with numbers. If we try it on a non-numerical field, we get
SQL> SELECT SUM(NAME)FROM TEAMSTATS;
***### ERROR:
ORA-01722: invalid number
no rows selected
This error message is logical because we cannot sum a group of names.
AVG
The AVG function computes the average of a column. To find the average number of strike outs, use this:
SQL> SELECT AVG(SO) AVE_STRIKE_OUTS FROM TEAMSTATS;
AVE_STRIKE_OUTS
---------------
16.166667
The following example illustrates the difference between SUM and AVG:
SQL> SELECT AVG(HITS/AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.26803448
Analysis:
The team was batting over 300 in the previous example! What happened? AVG computed the average of the combined column hits divided by at bats, whereas the example with SUM divided the total number of hits by the number of at bats. For example, player A gets 50 hits in 100 at bats for a .500 average. Player B gets 0 hits in 1 at bat for a 0.0 average. The average of 0.0 and 0.5 is .250. If we compute the combined average of 50 hits in 101 at bats, the answer is a respectable .495. The following statement returns the correct batting average:
SQL> SELECT AVG(HITS)/AVG(AB) TEAM_AVERAGE FROM TEAMSTATS;
TEAM_AVERAGE
------------
.33706294
Like the SUM function, AVG works only with numbers.
MAX
If we want to find the largest value in a column, use MAX. For example, what is the highest number of hits?
SQL> SELECT MAX(HITS)FROM TEAMSTATS;
MAX(HITS)
---------
70
Can we find out who has the most hits?
SQL> SELECT NAME FROM TEAMSTATS WHERE HITS = MAX(HITS);
***### ERROR at line 3:
ORA-00934: group function is not allowed here
Unfortunately, we can’t. The error message is a reminder that this group function ***### (remember that aggregate functions are also called group functions) doesn’t work in WHERE clause. What happens if we try a non-numerical column?
SQL> SELECT MAX(NAME)FROM TEAMSTATS;
MAX(NAME)
---------------
WORLEY
Here’s something new. MAX returns the highest (closest to Z) string. Finally, a function works with both characters and numbers.
MIN
MIN returns the lowest member of a column. To find out the fewest at bats, type
SQL> SELECT MIN(AB)FROM TEAMSTATS;
MIN(AB)
---------
1
The following statement returns the name closest to the beginning of the alphabet:
SQL> SELECT MIN(NAME)FROM TEAMSTATS;
MIN(NAME)
---------------
CASEY
We can combine MIN with MAX to give a range of values. For example:
SQL> SELECT MIN(AB), MAX(AB) FROM TEAMSTATS;
MIN(AB) MAX(AB)
-------- --------
1 187
This sort of information can be useful when using statistical functions. NOTE: As we mentioned in the introduction, the first five aggregate functions are described in the ANSI standard. The remaining aggregate functions have become de facto standards, present in all important implementations of SQL. We use the Oracle7 names for these functions. Other implementations may use different names.
VARIANCE
VARIANCE produces square of standard deviation, a number vital to many statistical calculations. It works like this:
SQL> SELECT VARIANCE(HITS)FROM TEAMSTATS;
VARIANCE(HITS)
--------------
802.96667
If we try a string
SQL> SELECT VARIANCE(NAME)FROM TEAMSTATS;
ERROR:
ORA-01722: invalid number
no rows selected
We find that VARIANCE is another function that works exclusively with numbers.
⌖ sql rdbms