How Elastic Infosec Optimizes Defend for Cost and Performance

In the world of Security Operations Centers (SOCs), data is valuable, but excessive data can be problematic. Collecting every single event from every endpoint is expensive, unnecessary, and could lead to performance issues on your workstations and clusters. At Elastic, we treat our own InfoSec team as "Customer Zero", we run the latest versions of all Elastic products, which includes deploying Elastic Defend on our entire fleet of workstations with all updates applied within 24 hours of a new version being released.

This article details the internal Elastic Infosec team's process to optimize our endpoint data collection. By leveraging Event Filtering and Advanced Policy Settings in Elastic Defend, we significantly reduced noise, improved cluster performance, and saved on storage costs, all while maintaining a robust security posture. By following these strategies you can significantly reduce your EDR costs with only a few hours of work.

Elastic Defend is a powerful Endpoint Detection and Response agent that provides comprehensive protection against advanced threats. Elastic Defend offers a wide range of capabilities, including prevention, detection, and response, to safeguard your endpoints. In addition to on-host detections and alerting, its capabilities include rich event telemetry collected directly from the endpoint and sent to your Elastic stack, such as process executions, network connections, DNS events, USB Device Events, DLL and Driver loads, API events, file system changes, and registry modifications. Elastic added default event filtering in 8.3.0+ that will automatically filter out known benign system events unless you disable it in the policy advanced settings. In addition to the built in filters, it is easy to add your own custom Event Filtering to Elastic Defend that will reduce your costs even further.

The environment: Worldwide Distributed Workforce

Our environment at Elastic isn't like most traditional enterprises. We are a remote first, distributed workforce with team members working remotely in over 43 countries around the world. Almost half of our employees are developers or engineers who are constantly pushing the boundaries of what an operating system can do. They are using Mac, Windows, and Linux workstations to compile software, build custom Linux kernels, run Elasticsearch clusters on Kubernetes on their workstations, and utilize complex development tools that can generate massive amounts of benign file and process activity.

When we initially rolled out Elastic Defend, our strategy was to first deploy to a small population of workstations from various different workcenters so we could get an idea of what the event volume looked like and filter out the noisiest events, and then gradually add more workstations each week. When we first installed Elastic Defend without any event filters we saw a very large volume of data, an average of 48k events per hour per workstation. A large amount of these events were being caused by benign but noisy management software such as Qualys, Jamf, inTune, etc. We needed a strategy to filter out the noise without creating blind spots for our security analysts.

Step 1: Identifying the Noise

When looking for noisy events there are generally two different categories of noise that you should look for:

Software that is installed on the majority of your workstations.
A single host that is creating far more noise than your other hosts.

When adding filters you will want to start with the first category of noise as that will make a bigger difference in the long run. A common cause of events like this are MDM agents or other applications that are constantly taking the same benign action such as writing to a log file and making network connections to ship logs to the cluster.

When a single host is creating significantly more events than other hosts it is often from a misconfiguration or a bug, in these cases the best solution is to fix the problem on the host. For example, we found a Linux system with a broken script that kept restarting and crashing thousands of times per second. Instead of adding an Event filter we reached out to the system owner and they fixed the script which also improved the performance of the system. If the events are caused by software installs that aren't on other hosts then event filters can be used to filter out for individual hosts. This will often be a single server such as a database or webserver causing a lot of network or file events compared to other systems.

We use the following ES|QL queries to pinpoint high-volume event categories, processes, and file paths. If you are using an older version of Elastic that does not support ES|QL you can use Lens visualizations in a similar way.

In the following ES|QL queries we use the logs-endpoint.events* index pattern. This is the default index pattern created by Elastic Defend for storing streamed events from endpoints. If you are using a custom configuration or cross cluster search this index pattern may be different.

Noisiest Event Categories and Actions: Use this query to find the categories and actions that are creating the most alerts. This is a good starting point to show you where the noisiest events are that will have the biggest impact if they are filtered.

FROM logs-endpoint.events*
| STATS event_count = count(*) BY event.category, event.action
| SORT event_count DESC
| LIMIT 10
| KEEP event.category, event.action, event_count

10 Noisiest Hosts: This query is a good way to find your noisiest workstations or servers.

FROM logs-endpoint.events*
| STATS event_count = count(*) BY host.id, host.name
| SORT event_count DESC
| LIMIT 10
| KEEP host.id, host.name, event_count

Noisiest events on a single host: Once you've identified a noisy host, use this query to drill down and find the specific processes, command lines, or file paths driving that volume. You can use the | WHERE host.id == "{HOST_ID}" filter on any of the following queries to drill down on a single host events.

FROM logs-endpoint.events*
| WHERE host.id == "{HOST_ID}"
| STATS event_count = count(*) BY event.category, event.action, process.name, process.command_line, file.path
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, process.command_line, event.category, event.action, file.path, event_count

Noisiest Process Names: Use this query to find which applications or system processes are responsible for the highest event volume globally across your fleet.

FROM logs-endpoint.events*
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count

Noisiest File Paths: Use this query to identify specific files or directories that are being accessed or modified frequently, often indicating logging or temporary file activity.

FROM logs-endpoint.events*
| WHERE event.category == "file"
| STATS event_count = count(*) BY file.path, event.action
| SORT event_count DESC
| LIMIT 10
| KEEP file.path, event.action, event_count

Top 10 Network Events by Process Name: Use this query to see which processes are generating the most network connection events, which can help identify chatty agents or services.

FROM logs-endpoint.events*
| WHERE event.category == "network"
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count

Top 10 Process Names by File Events: Use this query to identify which processes are generating the most file system noise, distinguishing them from other categories like network or registry events.

FROM logs-endpoint.events*
| WHERE event.category == "file"
| STATS event_count = count(*) BY process.name
| SORT event_count DESC
| LIMIT 10
| KEEP process.name, event_count

Step 2: Precise Event Filtering

Armed with this data, we utilize Event Filters in Elastic Defend. This feature allows you to prevent specific events from ever being sent to Elasticsearch, filtering them out directly at the endpoint. Filtering these events has no impact on the malware and host protections provided by Elastic Defend, it only stops these events from being sent to your cluster. This saves network bandwidth, disk storage, and CPU cycles on the workstations and ingest pipelines.

Filter example 1: Elasticsearch file noise

At Elastic we have a lot of users that run their own installations of Elasticsearch on their workstations as a way of doing testing or development. Elasticsearch will write files to disk very often as documents are ingested which can be quite noisy. Each filter is OS specific so you may need to create more than one version of some filters, this is an example of our MacOS version of this event filter:

Filter example 2: Linux Logfile modifications

On Linux systems log files are being constantly updated. This filter can be used to exclude all modification events when the file.extension is log. We would still receive events if a log file is created or deleted, but not when it is modified.

On MacOS systems that have Docker installed the docker backend process will run ps regularly to get information about the containers running on the workstation. Across our collection of workstations we were seeing these events over 153 million times per month. This filter can be used to exclude those events from collection.

Pro Tip: When applying filters, use the "Comments" field in the UI to document why a filter exists and link to the relevant ticket or investigation. This is crucial for long-term maintenance.

Step 3: Optimizing Performance at the Source

Beyond filtering, it is possible to make changes to the advanced settings of an Elastic Defend policy that will reduce the size of every event that is ingested. These advanced settings can reduce the number of events generated without sacrificing security. There are several features that help reduce the amount of data created by Elastic Agent.

Elastic Defend calculates MD5, SHA-1, and SHA-256 hashes for file events and alerts. Prior to 8.18 collecting all three hashes was enabled by default, but in 8.18 and newer the MD5 and SHA-1 hashes are disabled by default. These calculations consume workstation CPU cycles and cluster storage space calculating hashes that are unnecessary when we have the SHA-256 values.

If you have Elastic Agent prior to 8.18 and you want to disable these hash calculations, this is how you disable MD5 and SHA-1 collection in our integration policy settings:

Navigate to Integration Policies -> Elastic Defend.
Click Show advanced settings.
Under Windows/macOS/Linux event settings, set these values to false:
- windows.advanced.events.hash.md5
- windows.advanced.events.hash.sha1
- linux.advanced.events.hash.md5
- linux.advanced.events.hash.sha1
- macos.advanced.events.hash.md5
- macos.advanced.events.hash.sha1

Event Aggregation

Another effective way to reduce data volume is by utilizing event aggregation. Elastic Defend automatically merges short-lived process and network events with the same values into a single event document. Without this setting every process would create three separate start, fork, end events. With this setting enabled these three events are combined into a single document if they happen within a few seconds of each other.

This is particularly useful for environments where processes spin up and shut down rapidly. This feature is enabled by default on 8.18 and newer versions of Elastic Defend, but it can be enabled on older versions using the advanced settings. You can control this behavior using the advanced setting [linux|mac|windows].advanced.events.aggregate_process. We found that keeping these enabled significantly reduced our event count without impacting our ability to investigate incidents.

The Impact:

Reduced CPU Usage: The agent no longer spends cycles calculating three different hashes for every file event.
Smaller Event Size: Removing these fields slightly reduced the size of every file event JSON document sent to Elasticsearch, compounding into significant storage savings over billions of events.

Results

By implementing these changes, we transformed our detection environment:

Volume Reduction: We dropped from an average of ~48k events per host per hour to ~12k events per host per hour—a 75% reduction in noise.
Cost Savings: Assuming an average size of 1kb per document ingested, reducing event volume by 36,000 documents per host per hour translates to a reduction of ingested logs by 3.5TB per day for our fleet of 4,000 hosts. This results in an estimated reduction of around 100TB per month in our Elastic cluster, saving our team thousands of dollars every month. The true savings amount can vary depending on your settings such as ILM, logsdb, frozen storage, network transfer costs, cloud provider costs, and the hardware used in your cluster.
Improved Signal: Our analysts now see fewer benign events which improves overall search speed and makes it easier to find the signal in the noise when hunting for threats.

Conclusion

Automation and configuration tuning are powerful tools for any SOC, and they are essential for managing the rich telemetry provided by modern endpoint security solutions like Elastic Defend. Don't be intimidated by the volume of events collected; this visibility is your greatest asset in detecting advanced threats. By treating our internal security team as Customer Zero, we proved that you can aggressively filter noise and optimize configurations to save money and improve performance without compromising security. These changes not only reduced our storage footprint but also empowered our analysts to focus on what matters most: detecting and responding to real threats.

We encourage you to embrace the full capabilities of Elastic Defend. Don't be intimidated by the data—take control of your Endpoint data with event filters. Start by using ES|QL and Lens to identify your noisiest events, implement Event Filters to suppress benign activity, and review your Policy Settings to ensure you're only collecting the data you truly need. Ready to optimize your own environment? Start your free trial of Elastic Security today and experience the power of comprehensive endpoint protection.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.