Datafeed resources
editDatafeed resources
editA datafeed resource has the following properties:
-
aggregations
- (object) If set, the datafeed performs aggregation searches. Support for aggregations is limited and should only be used with low cardinality data. For more information, see Aggregating data for faster performance.
-
chunking_config
-
(object) Specifies how data searches are split into time chunks.
See Chunking configuration objects.
For example:
{"mode": "manual", "time_span": "3h"}
-
datafeed_id
- (string) A numerical character string that uniquely identifies the datafeed. This property is informational; you cannot change the identifier for existing datafeeds.
-
frequency
-
(time units) The interval at which scheduled queries are made while the
datafeed runs in real time. The default value is either the bucket span for short
bucket spans, or, for longer bucket spans, a sensible fraction of the bucket
span. For example:
150s
. -
indices
-
(array) An array of index names. For example:
["it_ops_metrics"]
-
job_id
- (string) The unique identifier for the job to which the datafeed sends data.
-
query
-
(object) The Elasticsearch query domain-specific language (DSL). This value
corresponds to the query object in an Elasticsearch search POST body. All the
options that are supported by Elasticsearch can be used, as this object is
passed verbatim to Elasticsearch. By default, this property has the following
value:
{"match_all": {"boost": 1}}
. -
query_delay
-
(time units) The number of seconds behind real time that data is queried. For
example, if data from 10:04 a.m. might not be searchable in Elasticsearch until
10:06 a.m., set this property to 120 seconds. The default value is randomly
selected between
60s
and120s
. This randomness improves the query performance when there are multiple jobs running on the same node. -
script_fields
- (object) Specifies scripts that evaluate custom expressions and returns script fields to the datafeed. The detector configuration objects in a job can contain functions that use these script fields. For more information, see Transforming data with script fields.
-
scroll_size
-
(unsigned integer) The
size
parameter that is used in Elasticsearch searches. The default value is1000
. -
types
-
(array) A list of types to search for within the specified indices. For
example:
[]
. This property is provided for backwards compatibility with releases earlier than 6.0.0. For more information, see Removal of mapping types. -
delayed_data_check_config
-
(object) Specifies whether the data feed checks for missing data and
the size of the window. For example:
{"enabled": true, "check_window": "1h"}
See Delayed data check configuration objects.
Chunking configuration objects
editDatafeeds might be required to search over long time periods, for several months or years. This search is split into time chunks in order to ensure the load on Elasticsearch is managed. Chunking configuration controls how the size of these time chunks are calculated and is an advanced configuration option.
A chunking configuration object has the following properties:
-
mode
-
There are three available modes:
-
auto
- The chunk size will be dynamically calculated. This is the default and recommended value.
-
manual
-
Chunking will be applied according to the specified
time_span
. -
off
- No chunking will be applied.
-
-
time_span
-
(time units) The time span that each search will be querying.
This setting is only applicable when the mode is set to
manual
. For example:3h
.
Delayed data check configuration objects
editThe datafeed can optionally search over indices that have already been read in
an effort to determine whether any data has subsequently been added to the index.
If missing data is found, it is a good indication that the query_delay
option
is set too low and the data is being indexed after the datafeed has passed that
moment in time. See
Working with delayed data.
This check runs only on real-time datafeeds.
The configuration object has the following properties:
-
enabled
-
(boolean) Specifies whether the datafeed periodically checks for delayed data.
Defaults to
true
. -
check_window
-
(time units) The window of time that is searched for late data. This window of
time ends with the latest finalized bucket. It defaults to
null
, which causes an appropriatecheck_window
to be calculated when the real-time datafeed runs. In particular, the defaultcheck_window
span calculation is based on the maximum of2h
or8 * bucket_span
.
Datafeed counts
editThe get datafeed statistics API provides information about the operational progress of a datafeed. All of these properties are informational; you cannot update their values:
-
assignment_explanation
- (string) For started datafeeds only, contains messages relating to the selection of a node.
-
datafeed_id
- (string) A numerical character string that uniquely identifies the datafeed.
-
node
-
(object) The node upon which the datafeed is started. The datafeed and job will be on the same node.
-
id
- The unique identifier of the node. For example, "0-o0tOoRTwKFZifatTWKNw".
-
name
-
The node name. For example,
0-o0tOo
. -
ephemeral_id
- The node ephemeral ID.
-
transport_address
-
The host and port where transport HTTP connections are
accepted. For example,
127.0.0.1:9300
. -
attributes
-
For example,
{"ml.max_open_jobs": "10"}
.
-
-
state
-
(string) The status of the datafeed, which can be one of the following values:
-
started
- The datafeed is actively receiving data.
-
stopped
- The datafeed is stopped and will not receive data until it is re-started.
-