- Machine Learning: other versions:
- What is Elastic Machine Learning?
- Setup and security
- Anomaly detection
- Finding anomalies
- Tutorial: Getting started with anomaly detection
- Advanced concepts
- API quick reference
- How-tos
- Generating alerts for anomaly detection jobs
- Aggregating data for faster performance
- Altering data in your datafeed with runtime fields
- Customizing detectors with custom rules
- Detecting anomalous categories of data
- Performing population analysis
- Reverting to a model snapshot
- Detecting anomalous locations in geographic data
- Mapping anomalies by location
- Adding custom URLs to machine learning results
- Anomaly detection jobs from visualizations
- Exporting and importing machine learning jobs
- Resources
- Function reference
- Supplied configurations
- Apache anomaly detection configurations
- APM anomaly detection configurations
- Auditbeat anomaly detection configurations
- Logs anomaly detection configurations
- Metricbeat anomaly detection configurations
- Metrics anomaly detection configurations
- Nginx anomaly detection configurations
- Security anomaly detection configurations
- Uptime anomaly detection configurations
- Data frame analytics
- Natural language processing
How a data frame analytics job works
editHow a data frame analytics job works
editA data frame analytics job is essentially a persistent Elasticsearch task. During its life cycle, it goes through four or five main phases depending on the analysis type:
- reindexing,
- loading data,
- analyzing,
- writing results,
- inference (regression and classification only).
Let’s take a look at the phases one-by-one.
Reindexing
editDuring the reindexing phase the documents from the source index or indices are copied to the destination index. If you want to define settings or mappings, create the index before you start the job. Otherwise, the job creates it using default settings.
Once the destination index is built, the data frame analytics job task calls the Elasticsearch Reindex API to launch the reindexing task.
Loading data
editAfter the reindexing is finished, the job fetches the needed data from the destination index. It converts the data into the format that the analysis process expects, then sends it to the analysis process.
Analyzing
editIn this phase, the job generates a machine learning model for analyzing the data. The specific phases of analysis vary depending on the type of data frame analytics job.
Outlier detection jobs have a single analysis phase called computing_outliers
,
in which they identify outliers in the data.
Regression and classification jobs have four analysis phases:
-
feature_selection
: Identifies which of the supplied fields are most relevant for predicting the dependent variable. -
coarse_parameter_search
: Identifies initial values for undefined hyperparameters. -
fine_tuning_parameters
: Identifies final values for undefined hyperparameters. See hyperparameter optimization. -
final_training
: Trains the machine learning model.
Writing results
editAfter the loaded data is analyzed, the analysis process sends back the results. Only the additional fields that the analysis calculated are written back, the ones that have been loaded in the loading data phase are not. The data frame analytics job matches the results with the data rows in the destination index, merges them, and indexes them back to the destination index.
Inference
editThis phase exists only for regression and classification jobs. In this phase, the job validates the trained model against the test split of the data set.
Finally, after all phases are completed, the task is marked as completed and the data frame analytics job stops. Your data is ready to be evaluated.