New

The executive guide to generative AI

Read more
Loading

Query DSL

Elastic Stack Serverless

Query DSL is a full-featured JSON-style query language that enables complex searching, filtering, and aggregations. It is the original and most powerful query language for Elasticsearch today.

The _search endpoint accepts queries written in Query DSL syntax.

Query DSL support a wide range of search techniques, including the following:

  • Full-text search: Search text that has been analyzed and indexed to support phrase or proximity queries, fuzzy matches, and more.
  • Keyword search: Search for exact matches using keyword fields.
  • Semantic search: Search semantic_text fields using dense or sparse vector search on embeddings generated in your Elasticsearch cluster.
  • Vector search: Search for similar dense vectors using the kNN algorithm for embeddings generated outside of Elasticsearch.
  • Geospatial search: Search for locations and calculate spatial relationships using geospatial queries.

You can also filter data using Query DSL. Filters enable you to include or exclude documents by retrieving documents that match specific field-level criteria. A query that uses the filter parameter indicates filter context.

Aggregations are the primary tool for analyzing Elasticsearch data using Query DSL. Aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends.

Because aggregations leverage the same data structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. You can search documents, filter results, and perform analytics at the same time, on the same data, in a single request. That means aggregations are calculated in the context of the search query.

The following aggregation types are available:

  • Metric: Calculate metrics, such as a sum or average, from field values.
  • Bucket: Group documents into buckets based on field values, ranges, or other criteria.
  • Pipeline: Run aggregations on the results of other aggregations.

Run aggregations by specifying the search API's aggs parameter. Learn more in Run an aggregation.

Think of the Query DSL as an AST (Abstract Syntax Tree) of queries, consisting of two types of clauses:

Leaf query clauses: Leaf query clauses look for a particular value in a particular field, such as the match, term or range queries. These queries can be used by themselves.

Compound query clauses: Compound query clauses wrap other leaf or compound queries and are used to combine multiple queries in a logical fashion (such as the bool or dis_max query), or to alter their behavior (such as the constant_score query).

Query clauses behave differently depending on whether they are used in query context or filter context.

Allow expensive queries: Certain types of queries will generally execute slowly due to the way they are implemented, which can affect the stability of the cluster. Those queries can be categorized as follows:

The execution of such queries can be prevented by setting the value of the search.allow_expensive_queries setting to false (defaults to true).

By default, Elasticsearch sorts matching search results by relevance score, which measures how well each document matches a query.

The relevance score is a positive floating point number, returned in the _score metadata field of the search API. The higher the _score, the more relevant the document. While each query type can calculate relevance scores differently, score calculation also depends on whether the query clause is run in a query or filter context.

In the query context, a query clause answers the question How well does this document match this query clause? Besides deciding whether or not the document matches, the query clause also calculates a relevance score in the _score metadata field.

Query context is in effect whenever a query clause is passed to a query parameter, such as the query parameter in the search API.

A filter answers the binary question “Does this document match this query clause?”. The answer is simply "yes" or "no". Filtering has several benefits:

  1. Simple binary logic: In a filter context, a query clause determines document matches based on a yes/no criterion, without score calculation.
  2. Performance: Because they don’t compute relevance scores, filters execute faster than queries.
  3. Caching: Elasticsearch automatically caches frequently used filters, speeding up subsequent search performance.
  4. Resource efficiency: Filters consume less CPU resources compared to full-text queries.
  5. Query combination: Filters can be combined with scored queries to refine result sets efficiently.

Filters are particularly effective for querying structured data and implementing "must have" criteria in complex searches.

Structured data refers to information that is highly organized and formatted in a predefined manner. In the context of Elasticsearch, this typically includes:

  • Numeric fields (integers, floating-point numbers)
  • Dates and timestamps
  • Boolean values
  • Keyword fields (exact match strings)
  • Geo-points and geo-shapes

Unlike full-text fields, structured data has a consistent, predictable format, making it ideal for precise filtering operations.

Common filter applications include:

  • Date range checks: for example is the timestamp field between 2015 and 2016
  • Specific field value checks: for example is the status field equal to "published" or is the author field equal to "John Doe"

Filter context applies when a query clause is passed to a filter parameter, such as:

Filters optimize query performance and efficiency, especially for structured data queries and when combined with full-text searches.

Below is an example of query clauses being used in query and filter context in the search API. This query will match documents where all of the following conditions are met:

  • The title field contains the word search.
  • The content field contains the word elasticsearch.
  • The status field contains the exact word published.
  • The publish_date field contains a date from 1 Jan 2015 onwards.
 GET /_search {
  "query": {
    "bool": {
      "must": [
        { "match": { "title":   "Search"        }},
        { "match": { "content": "Elasticsearch" }}
      ],
      "filter": [
        { "term":  { "status": "published" }},
        { "range": { "publish_date": { "gte": "2015-01-01" }}}
      ]
    }
  }
}
  1. The query parameter indicates query context.
  2. The bool and two match clauses are used in query context, which means that they are used to score how well each document matches.
  3. The filter parameter indicates filter context. Its term and range clauses are used in filter context. They will filter out documents which do not match, but they will not affect the score for matching documents.
Warning

Scores calculated for queries in query context are represented as single precision floating point numbers; they have only 24 bits for significand’s precision. Score calculations that exceed the significand’s precision will be converted to floats with loss of precision.

Tip

Use query clauses in query context for conditions which should affect the score of matching documents (i.e. how well does the document match), and use all other query clauses in filter context.