As discussed in the following post, Elastic added instrumentation for OpenAI based applications in EDOT. The main application most commonly using LLMs is known as a Chatbot. These chatbots not only use large language models (LLMs), but are also using frameworks such as LangChain, and search to improve contextual information during a conversation RAG (Retrieval Augmented Generation). Elastics's sample RAG based Chatbot application, showcases how to use Elasticsearch with local data that has embeddings, enabling search to properly pull out the most contextual information during a query with a chatbot connected to an LLM of your choice. It's a great example of how to build out a RAG based application with Elasticsearch.
This app is also now insturmented with EDOT, and you can visualize the Chatbot's traces to OpenAI, as well as relevant logs, and metrics from the application. By running the app as instructed in the github repo with Docker you can see these traces on a local stack. But how about running it against serverless, Elastic cloud or even with Kubernetes?
In this blog we will walk through how to set up Elastic's RAG Based Chatbot application with Elastic cloud and Kubernetes.
Prerequisites:
In order to follow along, these few pre-requisites are needed
-
An Elastic Cloud account — sign up now, and become familiar with Elastic's OpenTelemetry configuration. With Serverless no version required. With regular cloud minimally 8.17
-
Git clone the RAG based Chatbot application and go through the tutorial on how to bring it up and become more familiar and how to bring up the application using Docker.
-
An account on OpenAI with API keys
-
Kubernetes cluster to run the RAG based Chatbot app
-
The instructions in this blog are also found in observability-examples in github.
Application OpenTelemetry output in Elastic
Chatbot-rag-app
The first item that you will need to get up and running is the ChatBotApp, and once up you should see the following:
As you select some of the questions you will set a response based on the index that was created in Elasticsearch when the app initializes. Additionally there will be queries that are made to LLMs.
Traces, logs, and metrics from EDOT in Elastic
Once you have the application running on your K8s cluster or with Docker, and Elastic Cloud up and running you should see the following:
Logs:
In Discover you will see logs from the Chatbotapp, and be able to analyze the application logs, any specific log patterns, which saves you time in analysis.
Traces:
In Elastic Observability APM, you can also see tha chatbot details, which include transactions, dependencies, logs, errors, etc.
When you look at traces, you will be able to see the chatbot interactions in the trace.
-
You will see the end to end http call
-
Individual calls to elasticsearch
-
Specific calls such as invoke actions, and calls to the LLM
You can also get individual details of the traces, and look at related logs, and metrics related to that trace,
Metrics:
In addition to logs, and traces, any instrumented metrics will also get ingested into Elastic.
Setting it all up with Docker
In order to properly set up the Chatbot-app on Docker with telemetry sent over to Elastic, a few things must be set up:
-
Git clone the chatbot-rag-app
-
Modify the env file as noted in the github README with the following exception:
Use your Elastic cloud's
You can find these in the Elastic Cloud under
Envs for sending the OTel instrumentation you will need the following:
OTEL_EXPORTER_OTLP_ENDPOINT="https://123456789.apm.us-west-2.aws.cloud.es.io:443"
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20xxxxx"
Notice the
-
Set the following to false -
OTEL_SDK_DISABLED=false -
Set the envs for LLMs
In this example we're using OpenAI, hence only three variables are needed.
LLM_TYPE=openai
OPENAI_API_KEY=XXXX
CHAT_MODEL=gpt-4o-mini
- Run the docker container as noted
docker compose up --build --force-recreate
-
Play with the app at
localhost:4000 -
Then log into Elastic cloud and see the output as shown previously.
Run chatbot-rag-app on Kubernetes
In order to set this up, you can follow the following repo on Observability-examples which has the Kubernetes yaml files being used. These will also point to Elastic Cloud.
-
Set up the Kubernetes Cluster (we're using EKS)
-
Create a docker image using the Dockerfile from the repo. However use the following build command to ensure it will run on any K8s environment.,
docker buildx build --platform linux/amd64 -t chatbot-rag-app .
-
Push the image to your favorite container repo
-
Get the appropriate ENV variables:
-
Find the
OTEL_EXPORTER_OTLP_ENDPOINT/HEADERvariables as noted in the pervious for Docker. -
Get your OpenAI Key
-
Elasticsearch URL, and username and password.
- Follow the instructions in the following github repo in observability examples to run two Kubernetes yaml files.
Essentially you need only replace the items in RED below with your values, and run
kubectl create -f k8s-deployment.yaml
kubectl create -f init-index-job.yaml
The app needs to be running first, then we use the app to initialize Elasticsearch with indices for the app.
Init-index-job.yaml
apiVersion: batch/v1
kind: Job
metadata:
name: init-elasticsearch-index-test
spec:
template:
spec:
containers:
- name: init-index
<span style={{ color: 'red', fontWeight: 'bold' }}>image: yourimagelocation:latest</span>
workingDir: /app/api
command: ["python3", "-m", "flask", "--app", "app", "create-index"]
env:
- name: FLASK_APP
value: "app"
- name: LLM_TYPE
value: "openai"
- name: CHAT_MODEL
value: "gpt-4o-mini"
- name: ES_INDEX
value: "workplace-app-docs"
- name: ES_INDEX_CHAT_HISTORY
value: "workplace-app-docs-chat-history"
- name: ELASTICSEARCH_URL
valueFrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_URL
- name: ELASTICSEARCH_USER
valueFrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_USER
- name: ELASTICSEARCH_PASSWORD
valueFrom:
secretKeyRef:
name: chatbot-regular-secrets
key: ELASTICSEARCH_PASSWORD
envFrom:
- secretRef:
name: chatbot-regular-secrets
restartPolicy: Never
backoffLimit: 4
k8s-deployment.yaml
apiVersion: v1
kind: Secret
metadata:
name: chatbot-regular-secrets
type: Opaque
stringData:
ELASTICSEARCH_URL: "https://yourelasticcloud.es.us-west-2.aws.found.io"
ELASTICSEARCH_USER: "elastic"
ELASTICSEARCH_PASSWORD: "elastic"
OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer%20xxxx"
OTEL_EXPORTER_OTLP_ENDPOINT: "https://12345.apm.us-west-2.aws.cloud.es.io:443"
OPENAI_API_KEY: "YYYYYYYY"
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: chatbot-regular
spec:
replicas: 2
selector:
matchLabels:
app: chatbot-regular
template:
metadata:
labels:
app: chatbot-regular
spec:
containers:
- name: chatbot-regular
image: yourimagelocation:latest
ports:
- containerPort: 4000
env:
- name: LLM_TYPE
value: "openai"
- name: CHAT_MODEL
value: "gpt-4o-mini"
- name: OTEL_RESOURCE_ATTRIBUTES
value: "service.name=chatbot-regular,service.version=0.0.1,deployment.environment=dev"
- name: OTEL_SDK_DISABLED
value: "false"
- name: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
value: "true"
- name: OTEL_EXPERIMENTAL_RESOURCE_DETECTORS
value: "process_runtime,os,otel,telemetry_distro"
- name: OTEL_EXPORTER_OTLP_PROTOCOL
value: "http/protobuf"
- name: OTEL_METRIC_EXPORT_INTERVAL
value: "3000"
- name: OTEL_BSP_SCHEDULE_DELAY
value: "3000"
envFrom:
- secretRef:
name: chatbot-regular-secrets
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "1Gi"
cpu: "500m"
---
apiVersion: v1
kind: Service
metadata:
name: chatbot-regular-service
spec:
selector:
app: chatbot-regular
ports:
- port: 80
targetPort: 4000
type: LoadBalancer
Open App with LoadBalancer URL
Run the kubectl get services command and get the URL for the chatbot app
% kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
chatbot-regular-service LoadBalancer 10.100.130.44 xxxxxxxxx-1515488226.us-west-2.elb.amazonaws.com 80:30748/TCP 6d23h
-
Play with app and review telemetry in Elastic
-
Once you go to the URL, you should see all the screens we described earlier in the beginning of this blog.
Conclusion
With Elastic's Chatbot-rag-app you have an example of how to build out a OpenAI driven RAG based chat application. However, you still need to understand how well it performs, whether its working properly, etc. Using OTel and Elastic’s EDOT gives you the ability to achieve this. Additionally, you will generally run this application on Kubernetes. Hopefully this blog provides the outline of how to achieve this. Here are the other Tracing blogs:
App Observability with LLM (Tracing)-
LLM Observability -