Retrievers

A retriever is a specification to describe top documents returned from a search. A retriever replaces other elements of the search API that also return top documents such as query and knn. A retriever may have child retrievers where a retriever with two or more children is considered a compound retriever. This allows for complex behavior to be depicted in a tree-like structure, called the retriever tree, which clarifies the order of operations that occur during a search.

Tip

Refer to Retrievers for a high level overview of the retrievers abstraction. Refer to Retrievers examples for additional examples.

The following retrievers are available:

diversify: The diversify retriever reduces the results from another retriever by applying a diversification strategy to the top-N results.
knn: The knn retriever replaces the functionality of a knn search.
linear: The linear retriever linearly combines the scores of other retrievers for the top documents.
pinned: The pinned retriever always places specified documents at the top of the results, with the remaining hits provided by a secondary retriever.
rescorer: The rescorer retriever replaces the functionality of the query rescorer.
rrf: The rrf retriever produces top documents from reciprocal rank fusion (RRF).
rule: The rule retriever applies contextual Searching with query rules to pin or exclude documents for specific queries.
standard: The standard retriever replaces the functionality of a traditional query.
text_similarity_reranker: The text_similarity_reranker retriever enhances search results by re-ranking documents based on semantic similarity to a specified inference text, using a machine learning model.

Common usage guidelines

Using `from` and `size` with a retriever tree

The from and size parameters are provided globally as part of the general search API. They are applied to all retrievers in a retriever tree, unless a specific retriever overrides the size parameter using a different parameter such as rank_window_size. Though, the final search hits are always limited to size.

Using aggregations with a retriever tree

Aggregations are globally specified as part of a search request. The query used for an aggregation is the combination of all leaf retrievers as should clauses in a boolean query.

Restrictions on search parameters when specifying a retriever

When a retriever is specified as part of a search, the following elements are not allowed at the top-level:

query
knn
search_after
terminate_after
sort
rescore use a rescorer retriever instead

Multi-field query format

The linear and rrf retrievers support a multi-field query format that provides a simplified way to define searches across multiple fields without explicitly specifying inner retrievers. This format automatically generates appropriate inner retrievers based on the field types and query parameters. This is a great way to search an index, knowing little to nothing about its schema, while also handling normalization across lexical and semantic matches.

Field grouping

The multi-field query format groups queried fields into two categories:

Lexical fields: fields that support term queries, such as keyword and text fields.
Semantic fields: semantic_text fields.

Each field group is queried separately and the scores/ranks are normalized such that each contributes 50% to the final score/rank. This balances the importance of lexical and semantic fields. Most indices contain more lexical than semantic fields, and without this grouping the results would often bias towards lexical field matches.

Warning

In the linear retriever, this grouping relies on using a normalizer other than none (i.e., minmax or l2_norm). If you use the none normalizer, the scores across field groups will not be normalized and the results may be biased towards lexical field matches.

Linear retriever field boosting

When using the linear retriever, fields can be boosted using the ^ notation:

						GET books/_search
					{
  "retriever": {
    "linear": {
      "query": "elasticsearch",
      "fields": [
        "title^3",
        "description^2",
        "title_semantic",
        "description_semantic^2"
      ],
      "normalizer": "minmax"
    }
  }
}
		
	

3x weight
2x weight
1x weight (default)

Due to how the field group scores are normalized, per-field boosts have no effect on the range of the final score. Instead, they affect the importance of the field's score within its group.

For example, if the schema looks like:

						PUT /books
					{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "copy_to": "title_semantic"
      },
      "description": {
        "type": "text",
        "copy_to": "description_semantic"
      },
      "title_semantic": {
        "type": "semantic_text"
      },
      "description_semantic": {
        "type": "semantic_text"
      }
    }
  }
}
		
	

And we run this query:

						GET books/_search
					{
  "retriever": {
    "linear": {
      "query": "elasticsearch",
      "fields": [
        "title",
        "description",
        "title_semantic",
        "description_semantic"
      ],
      "normalizer": "minmax"
    }
  }
}
		
	

The score breakdown would be:

Lexical fields (50% of score):
- title: 50% of lexical fields group score, 25% of final score
- description: 50% of lexical fields group score, 25% of final score
Semantic fields (50% of score):
- title_semantic: 50% of semantic fields group score, 25% of final score
- description_semantic: 50% of semantic fields group score, 25% of final score

If we apply per-field boosts like so:

						GET books/_search
					{
  "retriever": {
    "linear": {
      "query": "elasticsearch",
      "fields": [
        "title^3",
        "description^2",
        "title_semantic",
        "description_semantic^2"
      ],
      "normalizer": "minmax"
    }
  }
}
		
	

The score breakdown would change to:

Lexical fields (50% of score):
- title: 60% of lexical fields group score, 30% of final score
- description: 40% of lexical fields group score, 20% of final score
Semantic fields (50% of score):
- title_semantic: 33% of semantic fields group score, 16.5% of final score
- description_semantic: 66% of semantic fields group score, 33% of final score

Wildcard field patterns

Field names support the * wildcard character to match multiple fields:

						GET books/_search
					{
  "retriever": {
    "rrf": {
      "query": "machine learning",
      "fields": [
        "title*",
        "*_text"
      ]
    }
  }
}
		
	

Match fields that start with title
Match fields that end with _text

Note, however, that wildcard field patterns will only resolve to fields that either:

Support term queries, such as keyword and text fields
Are semantic_text fields

Limitations

Single index: Until 9.2, multi-field queries only work with single index searches.
CCS (Cross Cluster Search): Multi-field queries do not support remote cluster searches