As discussed in the following post, Elastic added instrumentation for OpenAI based applications in EDOT. The main application most commonly using LLMs is known as a Chatbot. These chatbots not only use large language models (LLMs), but are also using frameworks such as LangChain, and search to improve contextual information during a conversation RAG (Retrieval Augmented Generation). Elastics's sample RAG based Chatbot application, showcases how to use Elasticsearch with local data that has embeddings, enabling search to properly pull out the most contextual information during a query with a chatbot connected to an LLM of your choice. It's a great example of how to build out a RAG based application with Elasticsearch.

This app is also now insturmented with EDOT, and you can visualize the Chatbot's traces to OpenAI, as well as relevant logs, and metrics from the application. By running the app as instructed in the github repo with Docker you can see these traces on a local stack. But how about running it against serverless, Elastic cloud or even with Kubernetes?

In this blog we will walk through how to set up Elastic's RAG Based Chatbot application with Elastic cloud and Kubernetes.

Prerequisites:

In order to follow along, these few pre-requisites are needed

An Elastic Cloud account — sign up now, and become familiar with Elastic's OpenTelemetry configuration. With Serverless no version required. With regular cloud minimally 8.17
Git clone the RAG based Chatbot application and go through the tutorial on how to bring it up and become more familiar and how to bring up the application using Docker.
An account on OpenAI with API keys
Kubernetes cluster to run the RAG based Chatbot app
The instructions in this blog are also found in observability-examples in github.

Application OpenTelemetry output in Elastic

Chatbot-rag-app

The first item that you will need to get up and running is the ChatBotApp, and once up you should see the following:

As you select some of the questions you will set a response based on the index that was created in Elasticsearch when the app initializes. Additionally there will be queries that are made to LLMs.

Traces, logs, and metrics from EDOT in Elastic

Once you have the application running on your K8s cluster or with Docker, and Elastic Cloud up and running you should see the following:

Logs:

In Discover you will see logs from the Chatbotapp, and be able to analyze the application logs, any specific log patterns, which saves you time in analysis.

Traces:

In Elastic Observability APM, you can also see tha chatbot details, which include transactions, dependencies, logs, errors, etc.

When you look at traces, you will be able to see the chatbot interactions in the trace.

You will see the end to end http call
Individual calls to elasticsearch
Specific calls such as invoke actions, and calls to the LLM

You can also get individual details of the traces, and look at related logs, and metrics related to that trace,

Metrics:

In addition to logs, and traces, any instrumented metrics will also get ingested into Elastic.

Setting it all up with Docker

In order to properly set up the Chatbot-app on Docker with telemetry sent over to Elastic, a few things must be set up:

Git clone the chatbot-rag-app
Modify the env file as noted in the github README with the following exception:

Use your Elastic cloud's

OTEL_EXPORTER_OTLP_ENDPOINT

and

OTEL_EXPORTER_OTLP_HEADER

instead.

You can find these in the Elastic Cloud under

integrations->APM

Envs for sending the OTel instrumentation you will need the following:

OTEL_EXPORTER_OTLP_ENDPOINT="https://123456789.apm.us-west-2.aws.cloud.es.io:443"
OTEL_EXPORTER_OTLP_HEADERS="Authorization=Bearer%20xxxxx"

Notice the

%20

in the headers. This will be needed to account for the space in credentials.

Set the following to false -
OTEL_SDK_DISABLED=false
Set the envs for LLMs

In this example we're using OpenAI, hence only three variables are needed.

LLM_TYPE=openai
OPENAI_API_KEY=XXXX
CHAT_MODEL=gpt-4o-mini

Run the docker container as noted

docker compose up --build --force-recreate

Play with the app at
localhost:4000
Then log into Elastic cloud and see the output as shown previously.

Run chatbot-rag-app on Kubernetes

In order to set this up, you can follow the following repo on Observability-examples which has the Kubernetes yaml files being used. These will also point to Elastic Cloud.

Set up the Kubernetes Cluster (we're using EKS)
Create a docker image using the Dockerfile from the repo. However use the following build command to ensure it will run on any K8s environment.,

docker buildx build --platform linux/amd64 -t chatbot-rag-app .

Push the image to your favorite container repo
Get the appropriate ENV variables:

Find the
OTEL_EXPORTER_OTLP_ENDPOINT/HEADER
variables as noted in the pervious for Docker.
Get your OpenAI Key
Elasticsearch URL, and username and password.

Follow the instructions in the following github repo in observability examples to run two Kubernetes yaml files.

Essentially you need only replace the items in RED below with your values, and run

kubectl create -f k8s-deployment.yaml
kubectl create -f init-index-job.yaml

The app needs to be running first, then we use the app to initialize Elasticsearch with indices for the app.

Init-index-job.yaml

apiVersion: batch/v1
kind: Job
metadata:
  name: init-elasticsearch-index-test
spec:
  template:
    spec:
      containers:
      - name: init-index
        <span style={{ color: 'red', fontWeight: 'bold' }}>image: yourimagelocation:latest</span>
        workingDir: /app/api
        command: ["python3", "-m", "flask", "--app", "app", "create-index"]
        env:
        - name: FLASK_APP
          value: "app"
        - name: LLM_TYPE
          value: "openai"
        - name: CHAT_MODEL
          value: "gpt-4o-mini"
        - name: ES_INDEX
          value: "workplace-app-docs"
        - name: ES_INDEX_CHAT_HISTORY
          value: "workplace-app-docs-chat-history"
        - name: ELASTICSEARCH_URL
          valueFrom:
            secretKeyRef:
              name: chatbot-regular-secrets
              key: ELASTICSEARCH_URL
        - name: ELASTICSEARCH_USER
          valueFrom:
            secretKeyRef:
              name: chatbot-regular-secrets
              key: ELASTICSEARCH_USER
        - name: ELASTICSEARCH_PASSWORD
          valueFrom:
            secretKeyRef:
              name: chatbot-regular-secrets
              key: ELASTICSEARCH_PASSWORD
        envFrom:
        - secretRef:
            name: chatbot-regular-secrets
      restartPolicy: Never
  backoffLimit: 4

k8s-deployment.yaml

apiVersion: v1
kind: Secret
metadata:
  name: chatbot-regular-secrets
type: Opaque
stringData:
  ELASTICSEARCH_URL: "https://yourelasticcloud.es.us-west-2.aws.found.io"
  ELASTICSEARCH_USER: "elastic"
  ELASTICSEARCH_PASSWORD: "elastic"
  OTEL_EXPORTER_OTLP_HEADERS: "Authorization=Bearer%20xxxx"
  OTEL_EXPORTER_OTLP_ENDPOINT: "https://12345.apm.us-west-2.aws.cloud.es.io:443"
  OPENAI_API_KEY: "YYYYYYYY"

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: chatbot-regular
spec:
  replicas: 2
  selector:
    matchLabels:
      app: chatbot-regular
  template:
    metadata:
      labels:
        app: chatbot-regular
    spec:
      containers:
      - name: chatbot-regular
        image: yourimagelocation:latest
        ports:
        - containerPort: 4000
        env:
        - name: LLM_TYPE
          value: "openai"
        - name: CHAT_MODEL
          value: "gpt-4o-mini"
        - name: OTEL_RESOURCE_ATTRIBUTES
          value: "service.name=chatbot-regular,service.version=0.0.1,deployment.environment=dev"
        - name: OTEL_SDK_DISABLED
          value: "false"
        - name: OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT
          value: "true"
        - name: OTEL_EXPERIMENTAL_RESOURCE_DETECTORS
          value: "process_runtime,os,otel,telemetry_distro"
        - name: OTEL_EXPORTER_OTLP_PROTOCOL
          value: "http/protobuf"
        - name: OTEL_METRIC_EXPORT_INTERVAL
          value: "3000"
        - name: OTEL_BSP_SCHEDULE_DELAY
          value: "3000"
        envFrom:
        - secretRef:
            name: chatbot-regular-secrets
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"

---
apiVersion: v1
kind: Service
metadata:
  name: chatbot-regular-service
spec:
  selector:
    app: chatbot-regular
  ports:
  - port: 80
    targetPort: 4000
  type: LoadBalancer

Open App with LoadBalancer URL

Run the kubectl get services command and get the URL for the chatbot app

% kubectl get services
NAME                                 TYPE           CLUSTER-IP    EXTERNAL-IP                                                               PORT(S)                                                                     AGE
chatbot-regular-service            LoadBalancer   10.100.130.44    xxxxxxxxx-1515488226.us-west-2.elb.amazonaws.com   80:30748/TCP                                                                6d23h

Play with app and review telemetry in Elastic
Once you go to the URL, you should see all the screens we described earlier in the beginning of this blog.

Conclusion

With Elastic's Chatbot-rag-app you have an example of how to build out a OpenAI driven RAG based chat application. However, you still need to understand how well it performs, whether its working properly, etc. Using OTel and Elastic’s EDOT gives you the ability to achieve this. Additionally, you will generally run this application on Kubernetes. Hopefully this blog provides the outline of how to achieve this. Here are the other Tracing blogs:

App Observability with LLM (Tracing)-

LLM Observability -

Tracing, logs, and metrics for a RAG based Chatbot with Elastic Distributions of OpenTelemetry