Aggregate Functions

edit

Functions for computing a single result from a set of input values. Elasticsearch SQL supports aggregate functions only alongside grouping (implicit or explicit).

General Purpose

edit

AVG

edit

Synopsis:

AVG(numeric_field) 

Input:

numeric field

Output: double numeric value

Description:Returns the Average (arithmetic mean) of input values.

SELECT AVG(salary) AS avg FROM emp;

      avg:d
---------------
48248.55

COUNT

edit

Synopsis:

COUNT(expression) 

Input:

a field name, wildcard (*) or any numeric value

Output: numeric value

Description:Returns the total number (count) of input values.

In case of COUNT(*) or COUNT(<literal>), all values are considered (including null or missing ones).

In case of COUNT(<field_name>) null values are not considered.

SELECT COUNT(*) AS count FROM emp;

     count
---------------
100

COUNT(ALL)

edit

Synopsis:

COUNT(ALL field_name) 

Input:

a field name

Output: numeric value

Description:Returns the total number (count) of all non-null input values. COUNT(<field_name>) and COUNT(ALL <field_name>) are equivalent.

SELECT COUNT(ALL last_name) AS count_all, COUNT(DISTINCT last_name) count_distinct FROM emp;

   count_all   |  count_distinct
---------------+------------------
100            |96

COUNT(DISTINCT)

edit

Synopsis:

COUNT(DISTINCT field_name) 

Input:

a field name

Output: numeric value

Description:Returns the total number of distinct non-null values in input values.

SELECT COUNT(DISTINCT hire_date) unique_hires, COUNT(hire_date) AS hires FROM emp;

  unique_hires  |     hires
----------------+---------------
99              |100

FIRST/FIRST_VALUE

edit

Synopsis:

FIRST(
    field_name               
    [, ordering_field_name]) 

Input:

target field for the aggregation

optional field used for ordering

Output: same type as the input

Description:Returns the first non-NULL value (if such exists) of the field_name input column sorted by the ordering_field_name column. If ordering_field_name is not provided, only the field_name column is used for the sorting. E.g.:

a b

100

1

200

1

1

2

2

2

10

null

20

null

null

null

SELECT FIRST(a) FROM t

will result in:

FIRST(a)

1

and

SELECT FIRST(a, b) FROM t

will result in:

FIRST(a, b)

100

SELECT FIRST(first_name) FROM emp;

   FIRST(first_name)
--------------------
Alejandro
SELECT gender, FIRST(first_name) FROM emp GROUP BY gender ORDER BY gender;

   gender   |   FIRST(first_name)
------------+--------------------
null        |   Berni
F           |   Alejandro
M           |   Amabile
SELECT FIRST(first_name, birth_date) FROM emp;

   FIRST(first_name, birth_date)
--------------------------------
Remzi
SELECT gender, FIRST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;

    gender    |   FIRST(first_name, birth_date)
--------------+--------------------------------
null          |   Lillian
F             |   Sumant
M             |   Remzi

FIRST_VALUE is a name alias and can be used instead of FIRST, e.g.:

SELECT gender, FIRST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;

    gender    |   FIRST_VALUE(first_name, birth_date)
--------------+--------------------------------------
null          |   Lillian
F             |   Sumant
M             |   Remzi

FIRST cannot be used in a HAVING clause.

FIRST cannot be used with columns of type text unless the field is also saved as a keyword.

LAST/LAST_VALUE

edit

Synopsis:

LAST(
    field_name               
    [, ordering_field_name]) 

Input:

target field for the aggregation

optional field used for ordering

Output: same type as the input

Description:It’s the inverse of FIRST/FIRST_VALUE. Returns the last non-NULL value (if such exists) of the field_name`input column sorted descending by the `ordering_field_name column. If ordering_field_name is not provided, only the field_name column is used for the sorting. E.g.:

a b

10

1

20

1

1

2

2

2

100

null

200

null

null

null

SELECT LAST(a) FROM t

will result in:

LAST(a)

200

and

SELECT LAST(a, b) FROM t

will result in:

LAST(a, b)

2

SELECT LAST(first_name) FROM emp;

   LAST(first_name)
-------------------
Zvonko
SELECT gender, LAST(first_name) FROM emp GROUP BY gender ORDER BY gender;

   gender   |   LAST(first_name)
------------+-------------------
null        |   Patricio
F           |   Xinglin
M           |   Zvonko
SELECT LAST(first_name, birth_date) FROM emp;

   LAST(first_name, birth_date)
-------------------------------
Hilari
SELECT gender, LAST(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;

   gender  |   LAST(first_name, birth_date)
-----------+-------------------------------
null       |   Eberhardt
F          |   Valdiodio
M          |   Hilari

LAST_VALUE is a name alias and can be used instead of LAST, e.g.:

SELECT gender, LAST_VALUE(first_name, birth_date) FROM emp GROUP BY gender ORDER BY gender;

   gender  |   LAST_VALUE(first_name, birth_date)
-----------+-------------------------------------
null       |   Eberhardt
F          |   Valdiodio
M          |   Hilari

LAST cannot be used in HAVING clause.

LAST cannot be used with columns of type text unless the field is also saved as a keyword.

MAX

edit

Synopsis:

MAX(field_name) 

Input:

a numeric field

Output: same type as the input

Description:Returns the maximum value across input values in the field field_name.

SELECT MAX(salary) AS max FROM emp;

      max
---------------
74999

MAX on a field of type text or keyword is translated into LAST/LAST_VALUE and therefore, it cannot be used in HAVING clause.

MIN

edit

Synopsis:

MIN(field_name) 

Input:

a numeric field

Output: same type as the input

Description:Returns the minimum value across input values in the field field_name.

SELECT MIN(salary) AS min FROM emp;

      min
---------------
25324

MIN on a field of type text or keyword is translated into FIRST/FIRST_VALUE and therefore, it cannot be used in HAVING clause.

SUM

edit

Synopsis:

SUM(field_name) 

Input:

a numeric field

Output: bigint for integer input, double for floating points

Description:Returns the sum of input values in the field field_name.

SELECT SUM(salary) AS sum FROM emp;

      sum
---------------
4824855

Statistics

edit

KURTOSIS

edit

Synopsis:

KURTOSIS(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Quantify the shape of the distribution of input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, KURTOSIS(salary) AS k FROM emp;

      min      |      max      |        k
---------------+---------------+------------------
25324          |74999          |2.0444718929142986

MAD

edit

Synopsis:

MAD(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Measure the variability of the input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, AVG(salary) AS avg, MAD(salary) AS mad FROM emp;

      min      |      max      |      avg      |      mad
---------------+---------------+---------------+---------------
25324          |74999          |48248.55       |10096.5

PERCENTILE

edit

Synopsis:

PERCENTILE(
    field_name,  
    numeric_exp) 

Input:

a numeric field

a numeric expression (must be a constant and not based on a field)

Output: double numeric value

Description:Returns the nth percentile (represented by numeric_exp parameter) of input values in the field field_name.

SELECT languages, PERCENTILE(salary, 95) AS "95th" FROM emp
       GROUP BY languages;

   languages   |      95th
---------------+-----------------
null           |74999.0
1              |72790.5
2              |71924.70000000001
3              |73638.25
4              |72115.59999999999
5              |61071.7

PERCENTILE_RANK

edit

Synopsis:

PERCENTILE_RANK(
    field_name,  
    numeric_exp) 

Input:

a numeric field

a numeric expression (must be a constant and not based on a field)

Output: double numeric value

Description:Returns the nth percentile rank (represented by numeric_exp parameter) of input values in the field field_name.

SELECT languages, PERCENTILE_RANK(salary, 65000) AS rank FROM emp GROUP BY languages;

   languages   |      rank
---------------+-----------------
null           |73.65766569962062
1              |73.7291625157734
2              |88.88005607010643
3              |79.43662623295829
4              |85.70446389643493
5              |100.0

SKEWNESS

edit

Synopsis:

SKEWNESS(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Quantify the asymmetric distribution of input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, SKEWNESS(salary) AS s FROM emp;

      min      |      max      |        s
---------------+---------------+------------------
25324          |74999          |0.2707722118423227

STDDEV_POP

edit

Synopsis:

STDDEV_POP(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Returns the population standard deviation of input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, STDDEV_POP(salary) AS stddev
       FROM emp;

      min      |      max      |      stddev
---------------+---------------+------------------
25324          |74999          |13765.125502787832

SUM_OF_SQUARES

edit

Synopsis:

SUM_OF_SQUARES(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Returns the sum of squares of input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, SUM_OF_SQUARES(salary) AS sumsq
       FROM emp;

      min      |      max      |     sumsq
---------------+---------------+----------------
25324          |74999          |2.51740125721E11

VAR_POP

edit

Synopsis:

VAR_POP(field_name) 

Input:

a numeric field

Output: double numeric value

Description:Returns the population variance of input values in the field field_name.

SELECT MIN(salary) AS min, MAX(salary) AS max, VAR_POP(salary) AS varpop FROM emp;

      min      |      max      |     varpop
---------------+---------------+----------------
25324          |74999          |1.894786801075E8