Important Elasticsearch configuration

edit

Elasticsearch requires very little configuration to get started, but there are a number of items which must be considered before using your cluster in production:

Our Elastic Cloud service configures these items automatically, making your cluster production-ready by default.

Path settings

edit

Elasticsearch writes the data you index to indices and data streams to a data directory. Elasticsearch writes its own application logs, which contain information about cluster health and operations, to a logs directory.

For macOS .tar.gz, Linux .tar.gz, and Windows .zip installations, data and logs are subdirectories of $ES_HOME by default. However, files in $ES_HOME risk deletion during an upgrade.

In production, we strongly recommend you set the path.data and path.logs in elasticsearch.yml to locations outside of $ES_HOME. Docker, Debian, and RPM installations write data and log to locations outside of $ES_HOME by default.

Supported path.data and path.logs values vary by platform:

Linux and macOS installations support Unix-style paths:

path:
  data: /var/data/elasticsearch
  logs: /var/log/elasticsearch

Don’t modify anything within the data directory or run processes that might interfere with its contents. If something other than Elasticsearch modifies the contents of the data directory, then Elasticsearch may fail, reporting corruption or other data inconsistencies, or may appear to work correctly having silently lost some of your data. Don’t attempt to take filesystem backups of the data directory; there is no supported way to restore such a backup. Instead, use Snapshot and restore to take backups safely. Don’t run virus scanners on the data directory. A virus scanner can prevent Elasticsearch from working correctly and may modify the contents of the data directory. The data directory contains no executables so a virus scan will only find false positives.

Elasticsearch offers a deprecated setting that allows you to specify multiple paths in path.data. To learn about this setting, and how to migrate away from it, refer to Multiple data paths.

Cluster name setting

edit

A node can only join a cluster when it shares its cluster.name with all the other nodes in the cluster. The default name is elasticsearch, but you should change it to an appropriate name that describes the purpose of the cluster.

cluster.name: logging-prod

Do not reuse the same cluster names in different environments. Otherwise, nodes might join the wrong cluster.

Changing the name of a cluster requires a full cluster restart.

Node name setting

edit

Elasticsearch uses node.name as a human-readable identifier for a particular instance of Elasticsearch. This name is included in the response of many APIs. The node name defaults to the hostname of the machine when Elasticsearch starts, but can be configured explicitly in elasticsearch.yml:

node.name: prod-data-2

Network host setting

edit

By default, Elasticsearch only binds to loopback addresses such as 127.0.0.1 and [::1]. This is sufficient to run a cluster of one or more nodes on a single server for development and testing, but a resilient production cluster must involve nodes on other servers. There are many network settings but usually all you need to configure is network.host:

network.host: 192.168.1.10

When you provide a value for network.host, Elasticsearch assumes that you are moving from development mode to production mode, and upgrades a number of system startup checks from warnings to exceptions. See the differences between development and production modes.

Discovery and cluster formation settings

edit

Configure two important discovery and cluster formation settings before going to production so that nodes in the cluster can discover each other and elect a master node.

discovery.seed_hosts
edit

Out of the box, without any network configuration, Elasticsearch will bind to the available loopback addresses and scan local ports 9300 to 9305 to connect with other nodes running on the same server. This behavior provides an auto-clustering experience without having to do any configuration.

When you want to form a cluster with nodes on other hosts, use the static discovery.seed_hosts setting. This setting provides a list of other nodes in the cluster that are master-eligible and likely to be live and contactable to seed the discovery process. This setting accepts a YAML sequence or array of the addresses of all the master-eligible nodes in the cluster. Each address can be either an IP address or a hostname that resolves to one or more IP addresses via DNS.

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11 
   - seeds.mydomain.com 
   - [0:0:0:0:0:ffff:c0a8:10c]:9301 

The port is optional and defaults to 9300, but can be overridden.

If a hostname resolves to multiple IP addresses, the node will attempt to discover other nodes at all resolved addresses.

IPv6 addresses must be enclosed in square brackets.

If your master-eligible nodes do not have fixed names or addresses, use an alternative hosts provider to find their addresses dynamically.

cluster.initial_master_nodes
edit

When you start an Elasticsearch cluster for the first time, a cluster bootstrapping step determines the set of master-eligible nodes whose votes are counted in the first election. In development mode, with no discovery settings configured, this step is performed automatically by the nodes themselves.

Because auto-bootstrapping is inherently unsafe, when starting a new cluster in production mode, you must explicitly list the master-eligible nodes whose votes should be counted in the very first election. You set this list using the cluster.initial_master_nodes setting on every master-eligible node. Do not configure this setting on master-ineligible nodes.

After the cluster forms successfully for the first time, remove the cluster.initial_master_nodes setting from each node’s configuration and never set it again for this cluster. Do not configure this setting on nodes joining an existing cluster. Do not configure this setting on nodes which are restarting. Do not configure this setting when performing a full-cluster restart. See Bootstrapping a cluster.

discovery.seed_hosts:
   - 192.168.1.10:9300
   - 192.168.1.11
   - seeds.mydomain.com
   - [0:0:0:0:0:ffff:c0a8:10c]:9301
cluster.initial_master_nodes: 
   - master-node-a
   - master-node-b
   - master-node-c

Identify the initial master nodes by their node.name, which defaults to their hostname. Ensure that the value in cluster.initial_master_nodes matches the node.name exactly. If you use a fully-qualified domain name (FQDN) such as master-node-a.example.com for your node names, then you must use the FQDN in this list. Conversely, if node.name is a bare hostname without any trailing qualifiers, you must also omit the trailing qualifiers in cluster.initial_master_nodes.

See bootstrapping a cluster and discovery and cluster formation settings.

Heap size settings

edit

By default, Elasticsearch automatically sets the JVM heap size based on a node’s roles and total memory. We recommend the default sizing for most production environments.

If needed, you can override the default sizing by manually setting the JVM heap size.

JVM heap dump path setting

edit

By default, Elasticsearch configures the JVM to dump the heap on out of memory exceptions to the default data directory. On RPM and Debian packages, the data directory is /var/lib/elasticsearch. On Linux and MacOS and Windows distributions, the data directory is located under the root of the Elasticsearch installation.

If this path is not suitable for receiving heap dumps, modify the -XX:HeapDumpPath=... entry in jvm.options:

  • If you specify a directory, the JVM will generate a filename for the heap dump based on the PID of the running instance.
  • If you specify a fixed filename instead of a directory, the file must not exist when the JVM needs to perform a heap dump on an out of memory exception. Otherwise, the heap dump will fail.

GC logging settings

edit

By default, Elasticsearch enables garbage collection (GC) logs. These are configured in jvm.options and output to the same default location as the Elasticsearch logs. The default configuration rotates the logs every 64 MB and can consume up to 2 GB of disk space.

You can reconfigure JVM logging using the command line options described in JEP 158: Unified JVM Logging. Unless you change the default jvm.options file directly, the Elasticsearch default configuration is applied in addition to your own settings. To disable the default configuration, first disable logging by supplying the -Xlog:disable option, then supply your own command line options. This disables all JVM logging, so be sure to review the available options and enable everything that you require.

To see further options not contained in the original JEP, see Enable Logging with the JVM Unified Logging Framework.

Examples
edit

Change the default GC log output location to /opt/my-app/gc.log by creating $ES_HOME/config/jvm.options.d/gc.options with some sample options:

# Turn off all previous logging configuratons
-Xlog:disable

# Default settings from JEP 158, but with `utctime` instead of `uptime` to match the next line
-Xlog:all=warning:stderr:utctime,level,tags

# Enable GC logging to a custom location with a variety of options
-Xlog:gc*,gc+age=trace,safepoint:file=/opt/my-app/gc.log:utctime,level,pid,tags:filecount=32,filesize=64m

Configure an Elasticsearch Docker container to send GC debug logs to standard error (stderr). This lets the container orchestrator handle the output. If using the ES_JAVA_OPTS environment variable, specify:

MY_OPTS="-Xlog:disable -Xlog:all=warning:stderr:utctime,level,tags -Xlog:gc=debug:stderr:utctime"
docker run -e ES_JAVA_OPTS="$MY_OPTS" # etc

Temporary directory settings

edit

By default, Elasticsearch uses a private temporary directory that the startup script creates immediately below the system temporary directory.

On some Linux distributions, a system utility will clean files and directories from /tmp if they have not been recently accessed. This behavior can lead to the private temporary directory being removed while Elasticsearch is running if features that require the temporary directory are not used for a long time. Removing the private temporary directory causes problems if a feature that requires this directory is subsequently used.

If you install Elasticsearch using the .deb or .rpm packages and run it under systemd, the private temporary directory that Elasticsearch uses is excluded from periodic cleanup.

If you intend to run the .tar.gz distribution on Linux or MacOS for an extended period, consider creating a dedicated temporary directory for Elasticsearch that is not under a path that will have old files and directories cleaned from it. This directory should have permissions set so that only the user that Elasticsearch runs as can access it. Then, set the $ES_TMPDIR environment variable to point to this directory before starting Elasticsearch.

JVM fatal error log setting

edit

By default, Elasticsearch configures the JVM to write fatal error logs to the default logging directory. On RPM and Debian packages, this directory is /var/log/elasticsearch. On Linux and MacOS and Windows distributions, the logs directory is located under the root of the Elasticsearch installation.

These are logs produced by the JVM when it encounters a fatal error, such as a segmentation fault. If this path is not suitable for receiving logs, modify the -XX:ErrorFile=... entry in jvm.options.

Cluster backups

edit

In a disaster, snapshots can prevent permanent data loss. Snapshot lifecycle management is the easiest way to take regular backups of your cluster. For more information, see Create a snapshot.

Taking a snapshot is the only reliable and supported way to back up a cluster. You cannot back up an Elasticsearch cluster by making copies of the data directories of its nodes. There are no supported methods to restore any data from a filesystem-level backup. If you try to restore a cluster from such a backup, it may fail with reports of corruption or missing files or other data inconsistencies, or it may appear to have succeeded having silently lost some of your data.