Terms Facet

edit

Allow to specify field facets that return the N most frequent terms. For example:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10
            }
        }
    }
}

It is preferred to have the terms facet executed on a non analyzed field, or a field without a large number of terms it breaks to.

Accuracy Control

edit

Added in 0.90.6.

The size parameter defines how many top terms should be returned out of the overall terms list. By default, the node coordinating the search process will ask each shard to provide its own top size terms and once all shards respond, it will reduce the results to the final list that will then be sent back to the client. This means that if the number of unique terms is greater than size, the returned list is slightly off and not accurate (it could be that the term counts are slightly off and it could even be that a term that should have been in the top size entries was not returned).

The higher the requested size is, the more accurate the results will be, but also, the more expensive it will be to compute them (both due to bigger priority queues that are managed on a shard level and due to bigger data transfers between the nodes and the client). In an attempt to minimize the extra work that comes with bigger requested size the shard_size parameter was introduced. When defined, it will determine how many terms the coordinating node is will request from each shard. Once all the shards responded, the coordinating node will reduce them to a final result which will be based on the size parameter - this way, one can increase the accuracy of the returned terms while avoiding the overhead of streaming a big list of terms back to the client.

Note that shard_size cannot be smaller than size…​ if that’s the case elasticsearch will override it and reset it to be equal to size.

Ordering

edit

Allow to control the ordering of the terms facets, to be ordered by count, term, reverse_count or reverse_term. The default is count. Here is an example:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "order" : "term"
            }
        }
    }
}

All Terms

edit

Allow to get all the terms in the terms facet, ones that do not match a hit, will have a count of 0. Note, this should not be used with fields that have many terms.

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "all_terms" : true
            }
        }
    }
}

Excluding Terms

edit

It is possible to specify a set of terms that should be excluded from the terms facet request result:

{
    "query" : {
        "match_all" : { }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "exclude" : ["term1", "term2"]
            }
        }
    }
}

Regex Patterns

edit

The terms API allows to define regex expression that will control which terms will be included in the faceted list, here is an example:

{
    "query" : {
        "match_all" : { }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "regex" : "_regex expression here_",
                "regex_flags" : "DOTALL"
            }
        }
    }
}

Check Java Pattern API for more details about regex_flags options.

Term Scripts

edit

Allow to define a script for terms facet to process the actual term that will be used in the term facet collection, and also optionally control its inclusion or not.

The script can either return a boolean value, with true to include it in the facet collection, and false to exclude it from the facet collection.

Another option is for the script to return a string controlling the term that will be used to count against. The script execution will include the term variable which is the current field term used.

For example:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "script" : "term + 'aaa'"
            }
        }
    }
}

And using the boolean feature:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "tag",
                "size" : 10,
                "script" : "term == 'aaa' ? true : false"
            }
        }
    }
}

Multi Fields

edit

The term facet can be executed against more than one field, returning the aggregation result across those fields. For example:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "fields" : ["tag1", "tag2"],
                "size" : 10
            }
        }
    }
}

Script Field

edit

A script that provides the actual terms that will be processed for a given doc. A script_field (or script which will be used when no field or fields are provided) can be set to provide it.

As an example, a search request (that is quite "heavy") can be executed and use either _source itself or _fields (for stored fields) without needing to load the terms to memory (at the expense of much slower execution of the search, and causing more IO load):

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "my_facet" : {
            "terms" : {
                "script_field" : "_source.my_field",
                "size" : 10
            }
        }
    }
}

Or:

{
    "query" : {
        "match_all" : {  }
    },
    "facets" : {
        "my_facet" : {
            "terms" : {
                "script_field" : "_fields['my_field']",
                "size" : 10
            }
        }
    }
}

Note also, that the above will use the whole field value as a single term.

_index

edit

The term facet allows to specify a special field name called _index. This will return a facet count of hits per _index the search was executed on (relevant when a search request spans more than one index).

Memory Considerations

edit

Term facet causes the relevant field values to be loaded into memory. This means that per shard, there should be enough memory to contain them. It is advisable to explicitly set the fields to be not_analyzed or make sure the number of unique tokens a field can have is not large.