Search APIs

edit

Most search APIs are multi-index, multi-type, with the exception of the Explain API endpoints.

Routing

edit

When executing a search, it will be broadcast to all the index/indices shards (round robin between replicas). Which shards will be searched on can be controlled by providing the routing parameter. For example, when indexing tweets, the routing value can be the user name:

POST /twitter/tweet?routing=kimchy
{
    "user" : "kimchy",
    "postDate" : "2009-11-15T14:12:12",
    "message" : "trying out Elasticsearch"
}

In such a case, if we want to search only on the tweets for a specific user, we can specify it as the routing, resulting in the search hitting only the relevant shard:

POST /twitter/tweet/_search?routing=kimchy
{
    "query": {
        "bool" : {
            "must" : {
                "query_string" : {
                    "query" : "some query string here"
                }
            },
            "filter" : {
                "term" : { "user" : "kimchy" }
            }
        }
    }
}

The routing parameter can be multi valued represented as a comma separated string. This will result in hitting the relevant shards where the routing values match to.

Adaptive Replica Selection

edit

As an alternative to requests being sent to copies of the data in a round robin fashion, you may enable adaptive replica selection. This allows the coordinating node to send the request to the copy deemed "best" based on a number of criteria:

  • Response time of past requests between the coordinating node and the node containing the copy of the data
  • Time past search requests took to execute on the node containing the data
  • The queue size of the search threadpool on the node containing the data

This can be turned on by changing the dynamic cluster setting cluster.routing.use_adaptive_replica_selection from false to true:

PUT /_cluster/settings
{
    "transient": {
        "cluster.routing.use_adaptive_replica_selection": true
    }
}

Stats Groups

edit

A search can be associated with stats groups, which maintains a statistics aggregation per group. It can later be retrieved using the indices stats API specifically. For example, here is a search body request that associate the request with two different groups:

POST /_search
{
    "query" : {
        "match_all" : {}
    },
    "stats" : ["group1", "group2"]
}

Global Search Timeout

edit

Individual searches can have a timeout as part of the Request Body Search. Since search requests can originate from many sources, Elasticsearch has a dynamic cluster-level setting for a global search timeout that applies to all search requests that do not set a timeout in the Request Body Search. The default value is no global timeout. The setting key is search.default_search_timeout and can be set using the Cluster Update Settings endpoints. Setting this value to -1 resets the global search timeout to no timeout.

Search Cancellation

edit

Searches can be cancelled using standard task cancellation mechanism. By default, a running search only checks if it is cancelled or not on segment boundaries, therefore the cancellation can be delayed by large segments. The search cancellation responsiveness can be improved by setting the dynamic cluster-level setting search.low_level_cancellation to true. However, it comes with an additional overhead of more frequent cancellation checks that can be noticeable on large fast running search queries. Changing this setting only affects the searches that start after the change is made.