Networking

edit

Each Elasticsearch node has two different network interfaces. Clients send requests to Elasticsearch’s REST APIs using its HTTP interface, but nodes communicate with other nodes using the transport interface. The transport interface is also used for communication with remote clusters.

You can configure both of these interfaces at the same time using the network.* settings. If you have a more complicated network, you might need to configure the interfaces independently using the http.* and transport.* settings. Where possible, use the network.* settings that apply to both interfaces to simplify your configuration and reduce duplication.

By default Elasticsearch binds only to localhost which means it cannot be accessed remotely. This configuration is sufficient for a local development cluster made of one or more nodes all running on the same host. To form a cluster across multiple hosts, or which is accessible to remote clients, you must adjust some network settings such as network.host.

Be careful with the network configuration!

Never expose an unprotected node to the public internet. If you do, you are permitting anyone in the world to download, modify, or delete any of the data in your cluster.

Configuring Elasticsearch to bind to a non-local address will convert some warnings into fatal exceptions. If a node refuses to start after configuring its network settings then you must address the logged exceptions before proceeding.

Commonly used network settings

edit

Most users will need to configure only the following network settings.

network.host

(Static, string) Sets the address of this node for both HTTP and transport traffic. The node will bind to this address and will also use it as its publish address. Accepts an IP address, a hostname, or a special value.

Defaults to _local_.

http.port

(Static, integer) The port to bind for HTTP client communication. Accepts a single value or a range. If a range is specified, the node will bind to the first available port in the range.

Defaults to 9200-9300.

transport.port

(Static, integer) The port to bind for communication between nodes. Accepts a single value or a range. If a range is specified, the node will bind to the first available port in the range. Set this setting to a single port, not a range, on every master-eligible node.

Defaults to 9300-9400.

Special values for network addresses

edit

You can configure Elasticsearch to automatically determine its addresses by using the following special values. Use these values when configuring network.host, network.bind_host, network.publish_host, and the corresponding settings for the HTTP and transport interfaces.

_local_
Any loopback addresses on the system, for example 127.0.0.1.
_site_
Any site-local addresses on the system, for example 192.168.0.1.
_global_
Any globally-scoped addresses on the system, for example 8.8.8.8.
_[networkInterface]_
Use the addresses of the network interface called [networkInterface]. For example if you wish to use the addresses of an interface called en0 then set network.host: _en0_.
0.0.0.0
The addresses of all available network interfaces.

In some systems these special values resolve to multiple addresses. If so, Elasticsearch will select one of them as its publish address and may change its selection on each node restart. Ensure your node is accessible at every possible address.

Any values containing a : (e.g. an IPv6 address or some of the special values) must be quoted because : is a special character in YAML.

IPv4 vs IPv6

edit

These special values yield both IPv4 and IPv6 addresses by default, but you can also add an :ipv4 or :ipv6 suffix to limit them to just IPv4 or IPv6 addresses respectively. For example, network.host: "_en0:ipv4_" would set this node’s addresses to the IPv4 addresses of interface en0.

Discovery in the Cloud

More special settings are available when running in the Cloud with either the EC2 discovery plugin or the Google Compute Engine discovery plugin installed.

Binding and publishing

edit

Elasticsearch uses network addresses for two distinct purposes known as binding and publishing. Most nodes will use the same address for everything, but more complicated setups may need to configure different addresses for different purposes.

When an application such as Elasticsearch wishes to receive network communications, it must indicate to the operating system the address or addresses whose traffic it should receive. This is known as binding to those addresses. Elasticsearch can bind to more than one address if needed, but most nodes only bind to a single address. Elasticsearch can only bind to an address if it is running on a host that has a network interface with that address. If necessary, you can configure the transport and HTTP interfaces to bind to different addresses.

Each Elasticsearch node has an address at which clients and other nodes can contact it, known as its publish address. Each node has one publish address for its HTTP interface and one for its transport interface. These two addresses can be anything, and don’t need to be addresses of the network interfaces on the host. The only requirements are that each node must be:

  • Accessible at its transport publish address by all other nodes in its cluster, and by any remote clusters that will discover it using Sniff mode.
  • Accessible at its HTTP publish address by all clients that will discover it using sniffing.

Using a single address

edit

The most common configuration is for Elasticsearch to bind to a single address at which it is accessible to clients and other nodes. In this configuration you should just set network.host to that address. You should not separately set any bind or publish addresses, nor should you separately configure the addresses for the HTTP or transport interfaces.

Using multiple addresses

edit

Use the advanced network settings if you wish to bind Elasticsearch to multiple addresses, or to publish a different address from the addresses to which you are binding. Set network.bind_host to the bind addresses, and network.publish_host to the address at which this node is exposed. In complex configurations, you can configure these addresses differently for the HTTP and transport interfaces.

Advanced network settings

edit

These advanced settings let you bind to multiple addresses, or to use different addresses for binding and publishing. They are not required in most cases and you should not use them if you can use the commonly used settings instead.

network.bind_host
(Static, string) The network address(es) to which the node should bind in order to listen for incoming connections. Accepts a list of IP addresses, hostnames, and special values. Defaults to the address given by network.host. Use this setting only if binding to multiple addresses or using different addresses for publishing and binding.
network.publish_host
(Static, string) The network address that clients and other nodes can use to contact this node. Accepts an IP address, a hostname, or a special value. Defaults to the address given by network.host. Use this setting only if binding to multiple addresses or using different addresses for publishing and binding.

You can specify a list of addresses for network.host and network.publish_host. You can also specify one or more hostnames or special values that resolve to multiple addresses. If you do this then Elasticsearch chooses one of the addresses for its publish address. This choice uses heuristics based on IPv4/IPv6 stack preference and reachability and may change when the node restarts. Ensure each node is accessible at all possible publish addresses.

Advanced TCP settings

edit

Use the following settings to control the low-level parameters of the TCP connections used by the HTTP and transport interfaces.

network.tcp.keep_alive
(Static, boolean) Configures the SO_KEEPALIVE option for network sockets, which determines whether each connection sends TCP keepalive probes. Defaults to true.
network.tcp.keep_idle
(Static, integer) Configures the TCP_KEEPIDLE option for network sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults to -1, which means to use the system default. This value cannot exceed 300 seconds. Only applicable on Linux and macOS.
network.tcp.keep_interval
(Static, integer) Configures the TCP_KEEPINTVL option for network sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults to -1, which means to use the system default. This value cannot exceed 300 seconds. Only applicable on Linux and macOS.
network.tcp.keep_count
(Static, integer) Configures the TCP_KEEPCNT option for network sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults to -1, which means to use the system default. Only applicable on Linux and macOS.
network.tcp.no_delay
(Static, boolean) Configures the TCP_NODELAY option on network sockets, which determines whether TCP no delay is enabled. Defaults to true.
network.tcp.reuse_address
(Static, boolean) Configures the SO_REUSEADDR option for network sockets, which determines whether the address can be reused or not. Defaults to false on Windows and true otherwise.
network.tcp.send_buffer_size
(Static, byte value) Configures the size of the TCP send buffer for network sockets. Defaults to -1 which means to use the system default.
network.tcp.receive_buffer_size
(Static, byte value) Configures the size of the TCP receive buffer. Defaults to -1 which means to use the system default.

Advanced HTTP settings

edit

Use the following advanced settings to configure the HTTP interface independently of the transport interface. You can also configure both interfaces together using the network settings.

http.host

(Static, string) Sets the address of this node for HTTP traffic. The node will bind to this address and will also use it as its HTTP publish address. Accepts an IP address, a hostname, or a special value. Use this setting only if you require different configurations for the transport and HTTP interfaces.

Defaults to the address given by network.host.

http.bind_host
(Static, string) The network address(es) to which the node should bind in order to listen for incoming HTTP connections. Accepts a list of IP addresses, hostnames, and special values. Defaults to the address given by http.host or network.bind_host. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces.
http.publish_host
(Static, string) The network address for HTTP clients to contact the node using sniffing. Accepts an IP address, a hostname, or a special value. Defaults to the address given by http.host or network.publish_host. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces.
http.publish_port
(Static, integer) The port of the HTTP publish address. Configure this setting only if you need the publish port to be different from http.port. Defaults to the port assigned via http.port.
http.max_content_length
(Static, byte value) Maximum size of an HTTP request body. Defaults to 100mb.
http.max_initial_line_length
(Static, byte value) Maximum size of an HTTP URL. Defaults to 4kb.
http.max_header_size
(Static, byte value) Maximum size of allowed headers. Defaults to 8kb.
http.compression logo cloud

(Static, boolean) Support for compression when possible (with Accept-Encoding). If HTTPS is enabled, defaults to false. Otherwise, defaults to true.

Disabling compression for HTTPS mitigates potential security risks, such as a BREACH attack. To compress HTTPS traffic, you must explicitly set http.compression to true.

http.compression_level
(Static, integer) Defines the compression level to use for HTTP responses. Valid values are in the range of 1 (minimum compression) and 9 (maximum compression). Defaults to 3.
http.cors.enabled logo cloud

(Static, boolean) Enable or disable cross-origin resource sharing, which determines whether a browser on another origin can execute requests against Elasticsearch. Set to true to enable Elasticsearch to process pre-flight CORS requests. Elasticsearch will respond to those requests with the Access-Control-Allow-Origin header if the Origin sent in the request is permitted by the http.cors.allow-origin list. Set to false (the default) to make Elasticsearch ignore the Origin request header, effectively disabling CORS requests because Elasticsearch will never respond with the Access-Control-Allow-Origin response header.

If the client does not send a pre-flight request with an Origin header or it does not check the response headers from the server to validate the Access-Control-Allow-Origin response header, then cross-origin security is compromised. If CORS is not enabled on Elasticsearch, the only way for the client to know is to send a pre-flight request and realize the required response headers are missing.

http.cors.allow-origin logo cloud

(Static, string) Which origins to allow. If you prepend and append a forward slash (/) to the value, this will be treated as a regular expression, allowing you to support HTTP and HTTPs. For example, using /https?:\/\/localhost(:[0-9]+)?/ would return the request header appropriately in both cases. Defaults to no origins allowed.

A wildcard (*) is a valid value but is considered a security risk, as your Elasticsearch instance is open to cross origin requests from anywhere.

http.cors.max-age logo cloud
(Static, integer) Browsers send a "preflight" OPTIONS-request to determine CORS settings. max-age defines for how long, in seconds, the result should be cached. Defaults to 1728000 (20 days).
http.cors.allow-methods logo cloud
(Static, string) Which methods to allow. Defaults to OPTIONS, HEAD, GET, POST, PUT, DELETE.
http.cors.allow-headers logo cloud
(Static, string) Which headers to allow. Defaults to X-Requested-With, Content-Type, Content-Length.
http.cors.allow-credentials logo cloud

(Static, boolean) Whether the Access-Control-Allow-Credentials header should be returned. Defaults to false.

This header is only returned when the setting is set to true.

http.detailed_errors.enabled
(Static, boolean) Configures whether detailed error reporting in HTTP responses is enabled. Defaults to true, which means that HTTP requests that include the ?error_trace parameter will return a detailed error message including a stack trace if they encounter an exception. If set to false, requests with the ?error_trace parameter are rejected.
http.pipelining.max_events
(Static, integer) The maximum number of events to be queued up in memory before an HTTP connection is closed, defaults to 10000.
http.max_warning_header_count
(Static, integer) The maximum number of warning headers in client HTTP responses. Defaults to -1 which means the number of warning headers is unlimited.
http.max_warning_header_size
(Static, byte value) The maximum total size of warning headers in client HTTP responses. Defaults to -1 which means the size of the warning headers is unlimited.
http.tcp.keep_alive
(Static, boolean) Configures the SO_KEEPALIVE option for this socket, which determines whether it sends TCP keepalive probes. Defaults to network.tcp.keep_alive.
http.tcp.keep_idle
(Static, integer) Configures the TCP_KEEPIDLE option for HTTP sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults to network.tcp.keep_idle, which uses the system default. This value cannot exceed 300 seconds. Only applicable on Linux and macOS.
http.tcp.keep_interval
(Static, integer) Configures the TCP_KEEPINTVL option for HTTP sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults to network.tcp.keep_interval, which uses the system default. This value cannot exceed 300 seconds. Only applicable on Linux and macOS.
http.tcp.keep_count
(Static, integer) Configures the TCP_KEEPCNT option for HTTP sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults to network.tcp.keep_count, which uses the system default. Only applicable on Linux and macOS.
http.tcp.no_delay
(Static, boolean) Configures the TCP_NODELAY option on HTTP sockets, which determines whether TCP no delay is enabled. Defaults to true.
http.tcp.reuse_address
(Static, boolean) Configures the SO_REUSEADDR option for HTTP sockets, which determines whether the address can be reused or not. Defaults to false on Windows and true otherwise.
http.tcp.send_buffer_size
(Static, byte value) The size of the TCP send buffer for HTTP traffic. Defaults to network.tcp.send_buffer_size.
http.tcp.receive_buffer_size
(Static, byte value) The size of the TCP receive buffer for HTTP traffic. Defaults to network.tcp.receive_buffer_size.
http.client_stats.enabled
(Dynamic, boolean) Enable or disable collection of HTTP client stats. Defaults to true.
http.client_stats.closed_channels.max_count
(Static, integer) When http.client_stats.enabled is true, sets the maximum number of closed HTTP channels for which Elasticsearch reports statistics. Defaults to 10000.
http.client_stats.closed_channels.max_age
(Static, time value) When http.client_stats.enabled is true, sets the maximum length of time after closing a HTTP channel that Elasticsearch will report that channel’s statistics. Defaults to 5m.

Advanced transport settings

edit

Use the following advanced settings to configure the transport interface independently of the HTTP interface. Use the network settings to configure both interfaces together.

transport.host

(Static, string) Sets the address of this node for transport traffic. The node will bind to this address and will also use it as its transport publish address. Accepts an IP address, a hostname, or a special value. Use this setting only if you require different configurations for the transport and HTTP interfaces.

Defaults to the address given by network.host.

transport.bind_host
(Static, string) The network address(es) to which the node should bind in order to listen for incoming transport connections. Accepts a list of IP addresses, hostnames, and special values. Defaults to the address given by transport.host or network.bind_host. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces.
transport.publish_host
(Static, string) The network address at which the node can be contacted by other nodes. Accepts an IP address, a hostname, or a special value. Defaults to the address given by transport.host or network.publish_host. Use this setting only if you require to bind to multiple addresses or to use different addresses for publishing and binding, and you also require different binding configurations for the transport and HTTP interfaces.
transport.publish_port
(Static, integer) The port of the transport publish address. Set this parameter only if you need the publish port to be different from transport.port. Defaults to the port assigned via transport.port.
transport.connect_timeout
(Static, time value) The connect timeout for initiating a new connection (in time setting format). Defaults to 30s.
transport.compress
(Static, string) Set to true, indexing_data, or false to configure transport compression between nodes. The option true will compress all data. The option indexing_data will compress only the raw index data sent between nodes during ingest, ccr following (excluding bootstrap), and operations based shard recovery (excluding transferring lucene files). Defaults to indexing_data.
transport.compression_scheme
(Static, string) Configures the compression scheme for transport.compress. The options are deflate or lz4. If lz4 is configured and the remote node has not been upgraded to a version supporting lz4, the traffic will be sent uncompressed. Defaults to lz4.
transport.tcp.keep_alive
(Static, boolean) Configures the SO_KEEPALIVE option for transport sockets, which determines whether they send TCP keepalive probes. Defaults to network.tcp.keep_alive.
transport.tcp.keep_idle
(Static, integer) Configures the TCP_KEEPIDLE option for transport sockets, which determines the time in seconds that a connection must be idle before starting to send TCP keepalive probes. Defaults to network.tcp.keep_idle if set, or the system default otherwise. This value cannot exceed 300 seconds. In cases where the system default is higher than 300, the value is automatically lowered to 300. Only applicable on Linux and macOS.
transport.tcp.keep_interval
(Static, integer) Configures the TCP_KEEPINTVL option for transport sockets, which determines the time in seconds between sending TCP keepalive probes. Defaults to network.tcp.keep_interval if set, or the system default otherwise. This value cannot exceed 300 seconds. In cases where the system default is higher than 300, the value is automatically lowered to 300. Only applicable on Linux and macOS.
transport.tcp.keep_count
(Static, integer) Configures the TCP_KEEPCNT option for transport sockets, which determines the number of unacknowledged TCP keepalive probes that may be sent on a connection before it is dropped. Defaults to network.tcp.keep_count if set, or the system default otherwise. Only applicable on Linux and macOS.
transport.tcp.no_delay
(Static, boolean) Configures the TCP_NODELAY option on transport sockets, which determines whether TCP no delay is enabled. Defaults to true.
transport.tcp.reuse_address
(Static, boolean) Configures the SO_REUSEADDR option for network sockets, which determines whether the address can be reused or not. Defaults to network.tcp.reuse_address.
transport.tcp.send_buffer_size
(Static, byte value) The size of the TCP send buffer for transport traffic. Defaults to network.tcp.send_buffer_size.
transport.tcp.receive_buffer_size
(Static, byte value) The size of the TCP receive buffer for transport traffic. Defaults to network.tcp.receive_buffer_size.
transport.ping_schedule
(Static, time value) Configures the time between sending application-level pings on all transport connections to promptly detect when a transport connection has failed. Defaults to -1 meaning that application-level pings are not sent. You should use TCP keepalives (see transport.tcp.keep_alive) instead of application-level pings wherever possible.

Transport profiles

edit

Elasticsearch allows you to bind to multiple ports on different interfaces by the use of transport profiles. See this example configuration

transport.profiles.default.port: 9300-9400
transport.profiles.default.bind_host: 10.0.0.1
transport.profiles.client.port: 9500-9600
transport.profiles.client.bind_host: 192.168.0.1
transport.profiles.dmz.port: 9700-9800
transport.profiles.dmz.bind_host: 172.16.1.2

The default profile is special. It is used as a fallback for any other profiles, if those do not have a specific configuration setting set, and is how this node connects to other nodes in the cluster. Other profiles can have any name and can be used to set up specific endpoints for incoming connections.

The following parameters can be configured on each transport profile, as in the example above:

  • port: The port to which to bind.
  • bind_host: The host to which to bind.
  • publish_host: The host which is published in informational APIs.

Profiles also support all the other transport settings specified in the transport settings section, and use these as defaults. For example, transport.profiles.client.tcp.reuse_address can be explicitly configured, and defaults otherwise to transport.tcp.reuse_address.

Long-lived idle connections

edit

A transport connection between two nodes is made up of a number of long-lived TCP connections, some of which may be idle for an extended period of time. Nonetheless, Elasticsearch requires these connections to remain open, and it can disrupt the operation of your cluster if any inter-node connections are closed by an external influence such as a firewall. It is important to configure your network to preserve long-lived idle connections between Elasticsearch nodes, for instance by leaving *.tcp.keep_alive enabled and ensuring that the keepalive interval is shorter than any timeout that might cause idle connections to be closed, or by setting transport.ping_schedule if keepalives cannot be configured. Devices which drop connections when they reach a certain age are a common source of problems to Elasticsearch clusters, and must not be used.

Request compression

edit

The default transport.compress configuration option indexing_data will only compress requests that relate to the transport of raw indexing source data between nodes. This option primarily compresses data sent during ingest, ccr, and shard recovery. This default normally makes sense for local cluster communication as compressing raw documents tends significantly reduce inter-node network usage with minimal CPU impact.

The transport.compress setting always configures local cluster request compression and is the fallback setting for remote cluster request compression. If you want to configure remote request compression differently than local request compression, you can set it on a per-remote cluster basis using the cluster.remote.${cluster_alias}.transport.compress setting.

Response compression

edit

The compression settings do not configure compression for responses. Elasticsearch will compress a response if the inbound request was compressed—​even when compression is not enabled. Similarly, Elasticsearch will not compress a response if the inbound request was uncompressed—​even when compression is enabled. The compression scheme used to compress a response will be the same scheme the remote node used to compress the request.

Request tracing

edit

You can trace individual requests made on the HTTP and transport layers.

Tracing can generate extremely high log volumes that can destabilize your cluster. Do not enable request tracing on busy or important clusters.

REST request tracer

edit

The HTTP layer has a dedicated tracer that logs incoming requests and the corresponding outgoing responses. Activate the tracer by setting the level of the org.elasticsearch.http.HttpTracer logger to TRACE:

PUT _cluster/settings
{
   "persistent" : {
      "logger.org.elasticsearch.http.HttpTracer" : "TRACE"
   }
}

You can also control which URIs will be traced, using a set of include and exclude wildcard patterns. By default every request will be traced.

PUT _cluster/settings
{
   "persistent" : {
      "http.tracer.include" : "*",
      "http.tracer.exclude" : ""
   }
}

Transport tracer

edit

The transport layer has a dedicated tracer that logs incoming and outgoing requests and responses. Activate the tracer by setting the level of the org.elasticsearch.transport.TransportService.tracer logger to TRACE:

PUT _cluster/settings
{
   "persistent" : {
      "logger.org.elasticsearch.transport.TransportService.tracer" : "TRACE"
   }
}

You can also control which actions will be traced, using a set of include and exclude wildcard patterns. By default every request will be traced except for fault detection pings:

PUT _cluster/settings
{
   "persistent" : {
      "transport.tracer.include" : "*",
      "transport.tracer.exclude" : "internal:coordination/fault_detection/*"
   }
}

Networking threading model

edit

This section describes the threading model used by the networking subsystem in Elasticsearch. This information isn’t required to use Elasticsearch, but it may be useful to advanced users who are diagnosing network problems in a cluster.

Elasticsearch nodes communicate over a collection of TCP channels that together form a transport connection. Elasticsearch clients communicate with the cluster over HTTP, which also uses one or more TCP channels. Each of these TCP channels is owned by exactly one of the transport_worker threads in the node. This owning thread is chosen when the channel is opened and remains the same for the lifetime of the channel.

Each transport_worker thread has sole responsibility for sending and receiving data over the channels it owns. One of the transport_worker threads is also responsible for accepting new incoming transport connections, and one is responsible for accepting new HTTP connections.

If a thread in Elasticsearch wants to send data over a particular channel, it passes the data to the owning transport_worker thread for the actual transmission.

Normally the transport_worker threads will not completely handle the messages they receive. Instead, they will do a small amount of preliminary processing and then dispatch (hand off) the message to a different threadpool for the rest of their handling. For instance, bulk messages are dispatched to the write threadpool, searches are dispatched to one of the search threadpools, and requests for statistics and other management tasks are mostly dispatched to the management threadpool. However in some cases the processing of a message is expected to be so quick that Elasticsearch will do all of the processing on the transport_worker thread rather than incur the overhead of dispatching it elsewhere.

By default, there is one transport_worker thread per CPU. In contrast, there may sometimes be tens-of-thousands of TCP channels. If data arrives on a TCP channel and its owning transport_worker thread is busy, the data isn’t processed until the thread finishes whatever it is doing. Similarly, outgoing data are not sent over a channel until the owning transport_worker thread is free. This means that we require every transport_worker thread to be idle frequently. An idle transport_worker looks something like this in a stack dump:

"elasticsearch[instance-0000000004][transport_worker][T#1]" #32 daemon prio=5 os_prio=0 cpu=9645.94ms elapsed=501.63s tid=0x00007fb83b6307f0 nid=0x1c4 runnable  [0x00007fb7b8ffe000]
   java.lang.Thread.State: RUNNABLE
	at sun.nio.ch.EPoll.wait([email protected]/Native Method)
	at sun.nio.ch.EPollSelectorImpl.doSelect([email protected]/EPollSelectorImpl.java:118)
	at sun.nio.ch.SelectorImpl.lockAndDoSelect([email protected]/SelectorImpl.java:129)
	- locked <0x00000000c443c518> (a sun.nio.ch.Util$2)
	- locked <0x00000000c38f7700> (a sun.nio.ch.EPollSelectorImpl)
	at sun.nio.ch.SelectorImpl.select([email protected]/SelectorImpl.java:146)
	at io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
	at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
	at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
	at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
	at java.lang.Thread.run([email protected]/Thread.java:833)

In the Nodes hot threads API an idle transport_worker thread is reported like this:

   100.0% [cpu=0.0%, other=100.0%] (500ms out of 500ms) cpu usage by thread 'elasticsearch[instance-0000000004][transport_worker][T#1]'
     10/10 snapshots sharing following 9 elements
       [email protected]/sun.nio.ch.EPoll.wait(Native Method)
       [email protected]/sun.nio.ch.EPollSelectorImpl.doSelect(EPollSelectorImpl.java:118)
       [email protected]/sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:129)
       [email protected]/sun.nio.ch.SelectorImpl.select(SelectorImpl.java:146)
       io.netty.channel.nio.NioEventLoop.select(NioEventLoop.java:813)
       io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:460)
       io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:986)
       io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)
       [email protected]/java.lang.Thread.run(Thread.java:833)

Note that transport_worker threads should always be in state RUNNABLE, even when waiting for input, because they block in the native EPoll#wait method. This means the hot threads API will report these threads at 100% overall utilisation. This is normal, and the breakdown of time into cpu= and other= fractions shows how much time the thread spent running and waiting for input respectively.

If a transport_worker thread is not frequently idle, it may build up a backlog of work. This can cause delays in processing messages on the channels that it owns. It’s hard to predict exactly which work will be delayed:

  • There are many more channels than threads. If work related to one channel is causing delays to its worker thread, all other channels owned by that thread will also suffer delays.
  • The mapping from TCP channels to worker threads is fixed but arbitrary. Each channel is assigned an owning thread in a round-robin fashion when the channel is opened. Each worker thread is responsible for many different kinds of channel.
  • There are many channels open between each pair of nodes. For each request, Elasticsearch will choose from the appropriate channels in a round-robin fashion. Some requests may end up on a channel owned by a delayed worker while other identical requests will be sent on a channel that’s working smoothly.

If the backlog builds up too far, some messages may be delayed by many seconds. The node might even fail its health checks and be removed from the cluster. Sometimes, you can find evidence of busy transport_worker threads using the Nodes hot threads API. However, this API itself sends network messages so may not work correctly if the transport_worker threads are too busy. It is more reliable to use jstack to obtain stack dumps or use Java Flight Recorder to obtain a profiling trace. These tools are independent of any work the JVM is performing.