Inicio‎ > ‎SQL‎ > ‎SQL (english)‎ > ‎

T05 Aggregation

It is possible to use arithmetic expressions not only in the select, clause, to obtain a new column in the result-table, but also in the creation of the rows selection conditions.

Supposing the course is divided in 2 semeters and 3 credits correspond with 1 weekly class-hour, name of the subjects and number of weekly class-hours of each one in a single semester.

select descripcion, (creditos/3)*2 horas
from asignaturas;

descripcion horas
DISEÑO Y GESTION DE BASES DE DATOS 4.00000
FUNDAMENTOS DE LAS BASES DE DATOS 4.00000
FUNDAMENTOS DE LA PROGRAMACION 6.00000
HISTORIA DE LA INFORMATICA 3.00000
PROGRAMACION CONCURRENTE 4.00000

Subject's credit are yearly: divided in 3, we obtain the hours to teach during a school year every week; if we reduce this to a single semester, every week we will have double class-hours.


Description of the subjects and number of weekly hours for the subjects with less than 4 weekly class-hours

select descripcion, creditos
from asignaturas
where (creditos/3)*2 < 4

descripcion creditos
HISTORIA DE LA INFORMATICA 4.5


Functions


The whole list of availabe functions in MySQL can be consulted in (http://dev.mysql.com/doc/refman/5.0/es/functions.htmls/fu). Note that every DBMS can provide its own functions that can be the same as the ones managed by MySQL or not.

Round 15.1297 to two decimals

select round(15.1297,2) from dual;

round(15.1297,2)
15.13

Obviously, to perform and show a calculus like this, no table is required. Dual is a virtual table being used in these cases to shown data not stored or calculated from any table. Actually, it is maintained for compatibility with others DBMS (Orable) because we can also executeselect round(15.1297,2) with the same result.



Aggregate functions

There are a set of aggregate functions that return calculated values over one or more columns.

The difference from the ones in the previous section is that these functions work over a set of values: they return a single value after processing a set of tuples selected through the condition specified in where; clause; in case this is not specified, the calculus is done over the whole column.

COUNT( * ) number of rows
COUNT( DISTINCT expr ) distinct number of values in expr
SUM( [DISTINCT] expr ) sum of all the values of expr
AVG( [DISTINCT] expr ) average of all the values in expr
MIN( expr ) the minimum value in expr
MAX( expr ) the maximum value in expr









Stadistical functions need that the expression being evaluated is built from numeric columns.

The expression expr can contain the name of a column or a calculus over one or more columns.

If the keyword distinct is specified, the expression must to be a column name, and it is assumed that the function is calculated only over distinct values of the expression.


How many teachers are there in our DB?

select count(*) profes from profesores;

profes
3


How many subjects of more that 4 credits we have?

select count(*) from asignaturas
where creditos > 4
;

count(*)
5

How many distinct values for credit we have?

select count(distinct creditos) quecreditos
from asignaturas;

quecreditos
3

There are three different credit values: 6.0, 9.0 y 4.5.

codigo descripcion creditos creditosp
DGBD DISEÑO Y GESTION DE BASES DE DATOS 6.0 3.0
FBD FUNDAMENTOS DE LAS BASES DE DATOS 6.0 1.5
FP FUNDAMENTOS DE LA PROGRAMACION 9.0 4.5
HI HISTORIA DE LA INFORMATICA 4.5  
PC PROGRAMACION CONCURRENTE 6.0 1.5


About DISTINCT modifier

All the aggregating functions (except count(*)) ignore nulls (NULL). Therefore, select count(creditos) from asignaturas should return the number of rows in asignaturas table with no null value for credits.

select count(*) filas, count(creditosp) valores, count(distinct creditosp) distintos from asignaturas;

filas valores distintos
5 4 3

The use of the distinct modifier implies, apart from this, that duplicates are not considered. Suppose a table with 5 rows and a column colx and we want to get the average of the stored values in it (NULL,1,1,1,3): AVG(colx) will return 1.5, whereas AVG(distinct colx) will return 2.

select avg(creditosp) sinDis, avg(distinct creditosp) conDis from asignaturas;

sinDis conDis
2.62500 3.00000

 

Group by

Remember that the use of aggregating functions allows us to perform calculus over the all the rows that fulfill a certain condition (or over all the rows of the table)

Number of credits being teach

select sum(creditos) credsImp
from imparte i, asignaturas a
where i.asignatura = a.codigo

credsImp
18.0

However, we usually need to perform this operation not in a global way, but depending on a particular criteria.

Number of credits taught by each teacher.

select sum(creditos) credsImp
from imparte i, asignaturas a
where i.asignatura = a.codigo
group by dni

credsImp
12.0
6.0

Actually, it would be better understood if we introduce more information in the output

select dni, sum(creditos) credsImp
from imparte i, asignaturas a
where i.asignatura = a.codigo
group by dni

dni credsImp
21111222 12.0
21333444 6.0


The criteria defined by group by (dni) establishs the different groups of rows in which the sum will be performed. The system looks for each different dni values within table imparte, and for all the rows sharing these values performs the sum of credits.

From there, we can add any information we consider important. 

Name of the teachers and credits they teach.

select nombre, sum(creditos) credsPorProf
from imparte i, asignaturas a, profesores p
where i.asignatura = a.codigo
and p.dni = i.dni
group by p.dni

SQL Error: Column 'ejemplo.p.nombre' isn't in GROUP BY

The previous error is due to the fact that is  compulsory adding to the group by all the no calculated columns that will appear in the final result.


On the contrary, it is not necessary that all the columns specified in the group by will appear in the output.

select nombre, sum(creditos) credsPorProf
from imparte i, asignaturas a, profesores p
where i.asignatura = a.codigo
and p.dni = i.dni
group by p.dni, nombre

nombre credsPorProf
EVA GOMEZ 12.0
RAFAEL ROMERO 6.0

Although with "nombre" was enough, we also added "DNI" just in case, I would be possible to have 2 different teachers with the same name.

Where y group by

When the clause where is used, apart from linking tables by common columns, as PROFESORES and IMPARTE using profesores.dni and imparte.dni respectively, it is also possible to use it to eliminate certain rows from the calculus.

Therefore, if we want to calculate how many subjects are taught by each teacher without taking into account FBD, we will write the following sentence:

select nombre, count(*)
from profesores p, imparte i
where p.dni = i.dni
    and asignatura != 'FBD'
group by p.dni, nombre;

nombre count(*)
EVA GOMEZ 1
RAFAEL ROMERO 1
 
The steps are, "first"  the select-from-where sentence is performed with calculus and "then", over the resulted rows, the calculus will be performed following the  group by criteria.

No key attributes

We have been using the primary key of teachers in the group by clauses in order to prevent erroneous operations. The only attributes that guarantee the identification of a tuple respect the others are the ones in the primary key (in general, candidate key).

Therefor, if the primary key of a people table (e.g teachers or students) is D.N.I., we assume that there is no duplicate values for this attribute. However, the name is not a key attribute, and therefore, admits duplicates. That is, it is perfectly possible to find to people with the same name.


Name of the coordinator and how many subjects he/she coordinates.

select nombre, count(*) coordina
from asignaturas a, coordinadores c
where c.asig = a.codigo
group by nombre;

ombre coordina
AGAPITO CIFUENTES 2
ROMUALDO GOMEZ 1

DNI and name of the coordinator and how many subjects he/she coordinates.

select dni, nombre, count(*) coordina
from asignaturas a, coordinadores c
where c.asig = a.codigo
group by dni, nombre;

dni nombre coordina
10111222 AGAPITO CIFUENTES 1
55777666 AGAPITO CIFUENTES 1
66555444 ROMUALDO GOMEZ 1

It seems obvious that the first solution is not correct, moreover taking into account the design of the table and the meaning of their data (a coordinator can not coordinate more than one subject).


Output ordering

It is possible to order the output using the calculated columns.

All these statements have the same output:

  • using the expression itself
select nombre, count( * ) asignaturas
from profesores p, imparte i
where p.dni=i.dni
group by p.dni, nombre
order by count( * );

  • using the column tab
select nombre, count( * ) asignaturas
from profesores p, imparte i
where p.dni=i.dni
group by p.dni, nombre
order by asignaturas;

  • using the ordering of the column
select nombre, count( * ) asignaturas
from profesores p, imparte i
where p.dni=i.dni
group by p.dni, nombre
order by 2;



nombre asignaturas
RAFAEL ROMERO 1
EVA GOMEZ 2