New

The executive guide to generative AI

Read more
IMPORTANT: This documentation is no longer updated. Refer to Elastic's version policy and the latest documentation.

Sparse vector datatype

edit

This functionality is in technical preview and may be changed or removed in a future release. Elastic will work to fix any issues, but features in technical preview are not subject to the support SLA of official GA features.

A sparse_vector field stores sparse vectors of float values. The maximum number of dimensions that can be in a vector should not exceed 500. The number of dimensions can be different across documents. A sparse_vector field is a single-valued field.

These vectors can be used for document scoring. For example, a document score can represent a distance between a given query vector and the indexed document vector.

You represent a sparse vector as an object, where object fields are dimensions, and fields values are values for these dimensions. Dimensions are integer values from 0 to 65535 encoded as strings. Dimensions don’t need to be in order.

PUT my_index
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "sparse_vector"
      },
      "my_text" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my_index/_doc/1
{
  "my_text" : "text1",
  "my_vector" : {"1": 0.5, "5": -0.5,  "100": 1}
}

PUT my_index/_doc/2
{
  "my_text" : "text2",
  "my_vector" : {"103": 0.5, "4": -0.5,  "5": 1, "11" : 1.2}
}

Internally, each document’s sparse vector is encoded as a binary doc value. Its size in bytes is equal to 6 * NUMBER_OF_DIMENSIONS, where NUMBER_OF_DIMENSIONS - number of the vector’s dimensions.

Was this helpful?
Feedback