Logs data stream
editLogs data stream
editThe Elasticsearch logsdb
index mode is generally available in Elastic Cloud Hosted
and self-managed Elasticsearch as of version 8.17, and is enabled by default for
logs in Elastic Cloud Serverless.
A logs data stream is a data stream type that stores log data more efficiently.
In benchmarks, log data stored in a logs data stream used ~2.5 times less disk space than a regular data stream. The exact impact varies by data set.
Create a logs data stream
editTo create a logs data stream, set your template index.mode
to logsdb
:
resp = client.indices.put_index_template( name="my-index-template", index_patterns=[ "logs-*" ], data_stream={}, template={ "settings": { "index.mode": "logsdb" } }, priority=101, ) print(resp)
const response = await client.indices.putIndexTemplate({ name: "my-index-template", index_patterns: ["logs-*"], data_stream: {}, template: { settings: { "index.mode": "logsdb", }, }, priority: 101, }); console.log(response);
PUT _index_template/my-index-template { "index_patterns": ["logs-*"], "data_stream": { }, "template": { "settings": { "index.mode": "logsdb" } }, "priority": 101 }
The index mode setting. |
|
The index template priority. By default, Elasticsearch ships with a |
After the index template is created, new indices that use the template will be configured as a logs data stream. You can start indexing data and using the data stream.
You can also set the index mode and adjust other template settings in the Elastic UI.
Synthetic source
editIf you have the required subscription, logsdb
index mode uses synthetic _source
, which omits storing the original _source
field. Instead, the document source is synthesized from doc values or stored fields upon document retrieval.
If you don’t have the required subscription, logsdb
mode uses the original _source
field.
Before using synthetic source, make sure to review the restrictions.
When working with multi-value fields, the index.mapping.synthetic_source_keep
setting controls how field values
are preserved for synthetic source reconstruction. In logsdb
, the default value is arrays
,
which retains both duplicate values and the order of entries. However, the exact structure of
array elements and objects is not necessarily retained. Preserving duplicates and ordering can be critical for some
log fields, such as DNS A records, HTTP headers, and log entries that represent sequential or repeated events.
Index sort settings
editIn logsdb
index mode, indices are sorted by the fields host.name
and @timestamp
by default.
-
If the
@timestamp
field is not present, it is automatically injected. -
If the
host.name
field is not present, it is automatically injected as akeyword
field, if possible.-
If
host.name
can’t be injected (for example,host
is a keyword field) or can’t be used for sorting (for example, its value is an IP address), only the@timestamp
is used for sorting. -
If
host.name
is injected andsubobjects
is set totrue
(default), thehost
field is mapped as an object field namedhost
with aname
child field of typekeyword
. Ifsubobjects
is set tofalse
, a singlehost.name
field is mapped as akeyword
field.
-
If
-
To prioritize the latest data,
host.name
is sorted in ascending order and@timestamp
is sorted in descending order.
You can override the default sort settings by manually configuring index.sort.field
and index.sort.order
. For more details, see Index Sorting.
To modify the sort configuration of an existing data stream, update the data stream’s component templates, and then perform or wait for a rollover.
If you apply custom sort settings, the @timestamp
field is injected into the mappings but is not
automatically added to the list of sort fields. For best results, include it manually as the last sort
field, with desc
ordering.
Existing data streams
editIf you’re enabling logsdb
index mode on a data stream that already exists, make sure to check mappings and sorting. The logsdb
mode automatically maps host.name
as a keyword if it’s included in the sort settings. If a host.name
field already exists but has a different type, mapping errors might occur, preventing logsdb
mode from being fully applied.
To avoid mapping conflicts, consider these options:
-
Adjust mappings: Check your existing mappings to ensure that
host.name
is mapped as a keyword. -
Change sorting: If needed, you can remove
host.name
from the sort settings and use a different set of fields. Sorting by@timestamp
can be a good fallback. -
Switch to a different index mode: If resolving
host.name
mapping conflicts is not feasible, you can choose not to uselogsdb
mode.
On existing data streams, logsdb
mode is applied on rollover (automatic or manual).
Optimized routing on sort fields
editTo reduce the storage footprint of logsdb
indexes, you can enable routing optimizations. A routing optimization uses the fields in the sort configuration (except for @timestamp
) to route documents to shards.
In benchmarks,
routing optimizations reduced storage requirements by 20% compared to the default logsdb
configuration, with a negligible penalty to ingestion
performance (1-4%). Routing optimizations can benefit data streams that are expected to grow substantially over
time. Exact results depend on the sort configuration and the nature of the logged data.
To configure a routing optimization:
-
Include the index setting
[index.logsdb.route_on_sort_fields:true]
in the data stream configuration. -
Configure index sorting with two or more fields, in addition to
@timestamp
. -
Make sure the
_id
field is not populated in ingested documents. It should be auto-generated instead.
A custom sort configuration is required, to improve storage efficiency and to minimize hotspots
from logging spikes that may route documents to a single shard. For best results, use a few sort fields
that have a relatively low cardinality and don’t co-vary (for example, host.name
and host.id
are not optimal).
Specialized codecs
editBy default, logsdb
index mode uses the best_compression
codec, which applies ZSTD
compression to stored fields. You can switch to the default
codec for faster compression with a slightly larger storage footprint.
The logsdb
index mode also automatically applies specialized codecs for numeric doc values, in order to optimize storage usage. Numeric fields are
encoded using the following sequence of codecs:
- Delta encoding: Stores the difference between consecutive values instead of the actual values.
- Offset encoding: Stores the difference from a base value rather than between consecutive values.
- Greatest Common Divisor (GCD) encoding: Finds the greatest common divisor of a set of values and stores the differences as multiples of the GCD.
- Frame Of Reference (FOR) encoding: Determines the smallest number of bits required to encode a block of values and uses bit-packing to fit such values into larger 64-bit blocks.
Each encoding is evaluated according to heuristics determined by the data distribution. For example, the algorithm checks whether the data is monotonically non-decreasing or non-increasing. If so, delta encoding is applied; otherwise, the process continues with the next encoding method (offset).
Encoding is specific to each Lucene segment and is reapplied when segments are merged. The merged Lucene segment might use a different encoding than the original segments, depending on the characteristics of the merged data.
For keyword fields, Run Length Encoding (RLE) is applied to the ordinals, which represent positions in the Lucene segment-level keyword dictionary. This compression is used when multiple consecutive documents share the same keyword.
ignore
settings
editThe logsdb
index mode uses the following ignore
settings. You can override these settings as needed.
ignore_malformed
editBy default, logsdb
index mode sets ignore_malformed
to true
. With this setting, documents with malformed fields
can be indexed without causing ingestion failures.
ignore_above
editIn logsdb
index mode, the index.mapping.ignore_above
setting is applied by default at the index level to ensure
efficient storage and indexing of large keyword fields.The index-level default for ignore_above
is 8191
characters. Using UTF-8 encoding, this results in a limit of 32764 bytes, depending on character encoding.
The mapping-level ignore_above
setting takes precedence. If a specific field has an ignore_above
value
defined in its mapping, that value overrides the index-level index.mapping.ignore_above
value. This default
behavior helps to optimize indexing performance by preventing excessively large string values from being indexed.
If you need to customize the limit, you can override it at the mapping level or change the index level default.
ignore_dynamic_beyond_limit
editIn logsdb
index mode, the setting index.mapping.total_fields.ignore_dynamic_beyond_limit
is set to true
by
default. This setting allows dynamically mapped fields to be added on top of statically defined fields, even when the total number of fields exceeds the index.mapping.total_fields.limit
. Instead of triggering an index failure, additional dynamically mapped fields are ignored so that ingestion can continue.
When automatically injected, host.name
and @timestamp
count toward the limit of mapped fields. If host.name
is mapped with subobjects: true
, it has two fields. When mapped with subobjects: false
, host.name
has only one field.
Fields without doc_values
editWhen the logsdb
index mode uses synthetic _source
and doc_values
are disabled for a field in the mapping,
Elasticsearch might set the store
setting to true
for that field. This ensures that the field’s
data remains accessible for reconstructing the document’s source when using
synthetic source.
For example, this adjustment occurs with text fields when store
is false
and no suitable multi-field is available for
reconstructing the original value.
Settings reference
editThe logsdb
index mode uses the following settings:
-
index.mode
:"logsdb"
-
index.mapping.synthetic_source_keep
:"arrays"
-
index.sort.field
:["host.name", "@timestamp"]
-
index.sort.order
:["desc", "desc"]
-
index.sort.mode
:["min", "min"]
-
index.sort.missing
:["_first", "_first"]
-
index.codec
:"best_compression"
-
index.mapping.ignore_malformed
:true
-
index.mapping.ignore_above
:8191
-
index.mapping.total_fields.ignore_dynamic_beyond_limit
:true
Notes about upgrading to Logsdb
editTODO: add notes.