Monitor Elastic Agents

edit

Fleet provides built-in capabilities for monitoring your fleet of Elastic Agents. In Fleet, you can:

Agent monitoring is turned on by default in the agent policy unless you turn it off. Want to turn off agent monitoring to stop collecting logs and metrics? See Change Elastic Agent monitoring settings.

Want to receive an alert when your Elastic Agent health status changes? Refer to Enable alerts and ML jobs based on Fleet and Elastic Agent status and our alerting example.

For more detail about how agents communicate their status to Fleet, refer to Elastic Agent health status.

View agent status overview

edit

To view the overall status of your Fleet-managed agents, in Kibana, go to Management → Fleet → Agents.

Agents tab showing status of each Elastic Agent

The Agents tab in Fleet displays a maximum of 10,000 agents, shown on 500 pages with 20 rows per page. If you have more than 10,000 agents, we recommend using the filtering and sorting options described in this section to narrow the table to fewer than 10,000 rows.

Elastic Agents can have the following statuses:

Healthy

Elastic Agents are enrolled and checked in. There are no agent policy updates or automatic agent binary updates in progress, but the agent binary may still be out of date. Elastic Agents continuously check in to the Fleet Server for required updates.

Unhealthy

Elastic Agents have errors or are running in a degraded state. An agent will be reported as unhealthy as a result of a configuration problem on the host system. For example, an Elastic Agent may not have the correct permissions required to run an integration that has been added to the Elastic Agent policy. In this case, you may need to investigate and address the situation.

Updating

Elastic Agents are updating the agent policy, updating the binary, or enrolling or unenrolling from Fleet.

Offline

Elastic Agents have stayed in an unhealthy status for a period of time. Offline agent’s API keys remain valid. You can still see these Elastic Agents in the Fleet UI and investigate them for further diagnosis if required.

Inactive

Elastic Agents have been offline for longer than the time set in your inactivity timeout. These Elastic Agents are valid, but have been removed from the main Fleet UI.

Unenrolled

Elastic Agents have been manually unenrolled and their API keys have been removed from the system. You can unenroll an offline Elastic Agent using Elastic Agent actions if you determine it’s offline and no longer valid.

These agents need to re-enroll in Fleet to be operational again.

The following diagram shows the flow of Elastic Agent statuses:

Diagram showing the flow of Fleet Agent statuses

To filter the list of agents by status, click the Status dropdown and select one or more statuses.

Agent Status dropdown with multiple statuses selected

For advanced filtering, use the search bar to create structured queries using Kibana Query Language. For example, enter local_metadata.os.family : "darwin" to see only agents running on macOS.

You can also sort the list of agents by host, last activity time, or version, by clicking on the table headings for those fields.

To perform a bulk action on more than 10,000 agents, you can select the Select everything on all pages button.

View details for an agent

edit

In Fleet, you can access the detailed status of an individual agent and the integrations that are associated with it through the agent policy.

  1. In Fleet, open the Agents tab.
  2. In the Host column, click the agent’s name.

On the Agent details tab, the Overview pane shows details about the agent and its performance, including its memory and CPU usage, last activity time, and last checkin message. To access metrics visualizations, you can also View the Elastic Agent metrics dashboard.

Agent details overview pane with various metrics

The Integrations pane shows the status of the integrations that have been added to the agent policy. Expand any integration to view its health status. Any errors or warnings are displayed as alerts.

Agent details integrations pane with health status

To gather more detail about a particular error or warning, from the Actions menu select View agent JSON. The JSON contains all of the raw agent data tracked by Fleet.

Currently, the Integrations pane shows the health status only for agent inputs. Health status is not yet available for agent outputs.

View agent activity

edit

You can view a chronological list of all operations performed by your Elastic Agents.

On the Agents tab, click Agent activity. All agent operations are shown, beginning from the most recent, including any in progress operations.

Agent activity panel

View agent logs

edit

When Fleet reports an agent status like Offline or Unhealthy, you might want to view the agent logs to diagnose potential causes. If agent monitoring is configured to collect logs (the default), you can view agent logs in Fleet.

  1. In Fleet, open the Agents tab.
  2. In the Host column, click the agent’s name.
  3. On the Agent details tab, verify that Monitor logs is enabled. If it’s not, refer to Change Elastic Agent monitoring settings.
  4. Click the Logs tab.

    View agent logs under agent details

On the Logs tab you can filter, search, and explore the agent logs:

  • Use the search bar to create structured queries using Kibana Query Language.
  • Choose one or more datasets to show logs for specific programs, such as Filebeat or Fleet Server.

    Fleet showing datasets for logging
  • Change the log level to filter the view by log levels. Want to see debugging logs? Refer to Change the logging level.
  • Change the time range to view historical logs.
  • Click Open in Logs to tail agent log files in real time. For more information about logging, refer to Tail log files.

Change the logging level

edit

The logging level for monitored agents is set to info by default. You can change the agent logging level, for example, to turn on debug logging remotely:

  1. After navigating to the Logs tab as described in View agent logs, scroll down to find the Agent logging level setting.

    {Logs} tab showing the agent logging level setting
  2. Select an Agent logging level:

    error

    Logs errors and critical errors.

    warning

    Logs warnings, errors, and critical errors.

    info

    Logs informational messages, including the number of events that are published. Also logs any warnings, errors, or critical errors.

    debug

    Logs debug messages, including a detailed printout of all events flushed. Also logs informational messages, warnings, errors, and critical errors.

  3. Click Apply changes to apply the updated logging level to the agent.

Collect Elastic Agent diagnostics

edit

Fleet provides the ability to remotely generate and gather an Elastic Agent’s diagnostics bundle. An agent can gather and upload diagnostics if it is online in a Healthy or Unhealthy state. To download the diagnostics bundle for local viewing:

  1. In Fleet, open the Agents tab.
  2. In the Host column, click the agent’s name.
  3. Select the Diagnostics tab and click the Request diagnostics .zip button.

    Collect agent diagnostics under agent details
  4. In the Request Diagnostics pop-up, select Collect additional CPU metrics if you’d like detailed CPU data.

    Collect agent diagnostics confirmation pop-up
  5. Click the Request diagnostics button.

When available, the new diagnostic bundle will be listed on this page, as well as any in-progress or previously collected bundles for the Elastic Agent.

Note that the bundles are stored in Elasticsearch and are removed automatically after 7 days. You can also delete any previously created bundle by clicking the trash can icon.

View the Elastic Agent metrics dashboard

edit

When agent monitoring is configured to collect metrics (the default), you can use the [Elastic Agent] Agent metrics dashboard in Kibana to view details about Elastic Agent resource usage, event throughput, and errors. This information can help you identify problems and make decisions about scaling your deployment.

To view agent metrics:

  1. In Fleet, open the Agents tab.
  2. In the Host column, click the agent’s name.
  3. On the Agent details tab, verify that Monitor metrics is enabled. If it’s not, refer to Change Elastic Agent monitoring settings.
  4. Click View more agent metrics to navigate to the [Elastic Agent] Agent metrics dashboard.

    Screen capture showing Elastic Agent metrics

The dashboard uses standard Kibana visualizations that you can extend to meet your needs.

Change Elastic Agent monitoring settings

edit

Agent monitoring is turned on by default in the agent policy. To change agent monitoring settings for all agents enrolled in a specific agent policy:

  1. In Fleet, open the Agent policies tab.
  2. Click the agent policy to edit it, then click Settings.
  3. Under Agent monitoring, deselect (or select) one or both of these settings: Collect agent logs and Collect agent metrics.
  4. Under Advanced monitoring options you can configure additional settings including an HTTP monitoring endpoint, diagnostics rate limiting, and diagnostics file upload limits. Refer to configure agent monitoring for details.
  5. Save your changes.

To turn off agent monitoring when creating a new agent policy:

  1. In the Create agent policy flyout, expand Advanced options.
  2. Under Agent monitoring, deselect Collect agent logs and Collect agent metrics.
  3. When you’re done configuring the agent policy, click Create agent policy.

Send Elastic Agent monitoring data to a remote Elasticsearch cluster

edit

You may want to store all of the health and status data about your Elastic Agents in a remote Elasticsearch cluster, so that it’s separate and independent from the deployment where you use Fleet to manage the agents.

To do so, follow the steps in Remote Elasticsearch output. After the new output is configured, follow the steps to update the Elastic Agent policy and make sure that the Output for agent monitoring setting is enabled. Elastic Agent monitoring data will use the remote Elasticsearch output that you configured.

Enable alerts and ML jobs based on Fleet and Elastic Agent status

edit

You can access the health status of Fleet-managed Elastic Agents and other Fleet settings through internal Fleet indices. This enables you to leverage various applications within the Elastic Stack that can be triggered by the provided information. For instance, you can now create alerts and machine learning (ML) jobs based on these specific fields. Refer to the Alerting documentation or see the example on this page to learn how to define rules that can trigger actions when certain conditions are met.

This functionality allows you to effectively track an agent’s status, and identify scenarios where it has gone offline, is experiencing health issues, or is facing challenges related to input or output.

The following datastreams and fields are available.

Datastream

metrics-fleet_server.agent_status-default

This data stream publishes the number of Elastic Agents in various states.

Fields

  • @timestamp
  • fleet.agents.total - A count of all agents
  • fleet.agents.enrolled - A count of all agents currently enrolled
  • fleet.agents.unenrolled - A count of agents currently unenrolled
  • fleet.agents.healthy - A count of agents currently healthy
  • fleet.agents.offline - A count of agents currently offline
  • fleet.agents.updating - A count of agents currently in the process of updating
  • fleet.agents.unhealthy - A count of agents currently unhealthy
  • fleet.agents.inactive - A count of agents currently inactive

Other fields regarding agent status, based on input and output health, are currently under consideration for future development.

Datastream

metrics-fleet_server.agent_versions-default

This index publishes a separate document for each version number and a count of enrolled agents only.

Fields

  • @timestamp
  • fleet.agent.version - A keyword field containing the version number
  • fleet.agent.count - A count of agents on the specified version
Example: Enable an alert for offline Elastic Agents
edit

You can set up an alert to notify you when one or more Elastic Agents goes offline:

  1. In Kibana, navigate to Management > Stack Management > Rules.
  2. Click Create rule.
  3. Select Elasticsearch query as the rule type.
  4. Choose a name for the rule, for example Elastic Agent status.
  5. Select KQL or Lucene as the query type.
  6. Select DATA VIEW metrics-* as the data view.
  7. Define your query, for example: fleet.agents.offline >= 1.
  8. Set the alert group, threshold, and time window. For example:

    • WHEN: count()
    • OVER: all documents
    • IS ABOVE: 0
    • FOR THE LAST 5 minutes

      This will generate an alert when one or more agents are reported by the fleet.agents.offline field over the last five minutes to be offline.

  9. Set the number of documents to send, for example:

    • SIZE: 100
  10. Set Check every to the frequency at which the rule condition should be evaluated. The default setting is one minute.
  11. Select an action to occur when the rule conditions are met. For example, to set the alert to send an email when an alert occurs, select the Email connector type and specify:

    • Email connector: Elastic-Cloud-SMTP
    • Action frequency: For each alert and On check intervals
    • Run when: Query matched
    • To: <the recipient email address>
    • Subject: <the email subject line>
  12. Click Save.

The new rule will be enabled and an email will be sent to the specified recipient when the alert conditions are met.

From the Rules page you can select the rule you created to enable or disable it, and to view the rule details including a list of active alerts and an alert history.

A screen capture showing the details for the new Elastic Agent status rule