Large language model performance matrix

edit

This page describes the performance of various large language models (LLMs) for different use cases in Elastic Security, based on our internal testing. To learn more about these use cases, refer to Attack discovery or AI Assistant.

Excellent is the best rating, followed by Great, then by Good, and finally by Poor.

Proprietary models

edit

Models from third-party LLM providers.

Feature Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery

Model

Claude 3: Opus

Excellent

Excellent

Excellent

Good

Great

Claude 3.5: Sonnet v2

Excellent

Excellent

Excellent

Excellent

Great

Claude 3.5: Sonnet

Excellent

Excellent

Excellent

Excellent

Excellent

Claude 3.5: Haiku

Excellent

Excellent

Excellent

Excellent

Poor

Claude 3: Haiku

Excellent

Excellent

Excellent

Excellent

Poor

GPT-4o

Excellent

Excellent

Excellent

Excellent

Great

GPT-4o-mini

Excellent

Great

Great

Great

Poor

Gemini 1.5 Pro 002

Excellent

Excellent

Excellent

Excellent

Excellent

Gemini 1.5 Flash 002

Excellent

Poor

Good

Excellent

Poor

Open-source models

edit

Models you can deploy yourself.

Feature Assistant - General Assistant - ES|QL generation Assistant - Alert questions Assistant - Knowledge retrieval Attack Discovery

Model

Mistral Nemo

Good

Good

Great

Good

Poor

LLama 3.2

Good

Poor

Good

Poor

Poor

LLama 3.1 405b

Good

Great

Good

Good

Poor

LLama 3.1 70b

Good

Good

Poor

Poor

Poor