Build RAG quickly with minimal code in Elastic 8.15

Learn how to build an end-to-end RAG pipeline with the S3 Connector, semantic_text datatype, and Elastic Playground.

Elastic 8.15 is out, and Semantic Search is easier than ever to pull off.

We're going to cover how to accomplish all of these tasks in 15 minutes:

  1. Store your documents in some data storage service like an AWS S3 Bucket
  2. Set up an Elastic S3 Connector
  3. Upload an embedding model using the eland library, set-up an inference API in Elastic
  4. Connect that to an index that uses the semantic_text datatype
  5. Add your inference API to that index
  6. Configure and sync content with the S3 Connector
  7. Use the Elastic Playground immediately

You will need:

  1. An Elastic Cloud Deployment updated to Elastic 8.15
  2. An S3 bucket
  3. An LLM API service (Anthropic, Azure, OpenAI, Gemini)

And that's it! Let's get this done.

Collecting data

To follow along with this specific demo, I've uploaded a zip file containing the data used here. It's the first 60 or so pages of the Silmarillion, each as a separate pdf file. I'm going through a Lord of the Rings kick at the moment. Feel free to download it and upload it to your S3 bucket!

Splitting the document into individual pages is sometimes necessary for large documents, as the native Elastic S3 Connector will not ingest content from files over 10MB in size.

I use this Python script for splitting a PDF into individual pages:

Setting up the S3 connector

The connector can ingest a huge variety of data types. Here, we're sticking to an S3 bucket loaded with pdf pages.

I'll just hop on my Elastic Cloud deployment, go to Search->Content->Connectors, and make a new connector called aws-connector, with all the default settings. Then I'll open up the configuration and add the name of my bucket, and the secret key and access key tagged to my AWS user.

Run a quick sync to verify that everything is working okay. Synchronization will ingest every uningested file in your data source, extract its content, and store it as a unique document within your index. Each document will contain its original filename. Data source documents with the same filenames as existing indexed documents won't be reingested, so have no fear! Synchronization can also be regularly scheduled. The method is described in the documentation. If everything is working fine, assuming my AWS credentials and permissions are all in order, the data's going to go into an index called aws-connector.

Looks like it's all good. Let's grab our embedding model!

Uploading an embedding model

Eland is a Python Elasticsearch client which makes it easy to convert numpy, pandas, and scikit-learn functions to Elasticsearch powered equivalents. For our purposes, it will be our method of uploading models from HuggingFace, for deployment in our Elasticsearch cluster. You can install eland like so:

Now get to a bash editor and make this little .sh script, filling out each parameter appropriately:

MODEL_ID refers to a model taken from huggingface. I'm choosing all-MiniLM-L6-v2 mainly because it is very good, but also very small, and easily runnable on a CPU. Run the bash script, and once done, your model should appear in your Elastic deployment under Machine Learning -> Model Management -> Trained Models.

Just click the circled play button to deploy the model, and you're done.

Setting up your semantic_text index

Time to set up semantic search. Navigate to Management -> Dev Tools, and delete your index because it does not have the semantic_text datatype enabled.

Check the model_id of your uploaded model with:

Now create an inference endpoint called minilm-l6, and pass it the correct model_id. Let's not worry about num_allocations and num_threads, because this isn't production and minilm-l6 is not a big-boy.

Now recreate the aws-connector index. Set the "body" property as type "semantic_text", and add the id of your new inference endpoint.

Get back to your connector and run another full-content sync (For real this time!). The incoming documents are going to be automatically chunked into blocks of 250 words, with an overlap of 100 words. You don't have to do anything explicitly. Now that's convenient!

And it's done. Check out your aws-connector index, there'll be 140 documents in there, each of which is now an embedded chunk:

Do RAG with the Elastic Playground

Scurry over to Search -> Build -> Playground and add an LLM connector of your choice. I'm using Azure OpenAI:

Now let's set up a chat experience. Click Add Data Sources and select aws-connector:

Check out the query tab of your new chat experience. Assuming everything was properly set up, it will automatically be set to this hybrid search query, with the model_id minilm-l6.

Let's ask a question! We'll take three documents for the context, and add my special RAG prompt:

Query: Describe the fall from Grace of Melkor

We'll use a relatively open-ended RAG query. To be answered satisfactorily, it will need to draw information from multiple parts of the text. This will be a good indicator of whether RAG is working as expected.

Well I'm convinced. It even has citations! One more for good luck:

Query: Who were the greatest students of Aule the Smith?

This particular query is nothing too difficult, I'm simply looking for a reference to a very specific quote from the text. Let's see how it does!

Well, that's correct. Looks like RAG is working just fine.

Conclusion

That was incredibly convenient and painless — hot damn! We're truly living in the future. I can definitely work with this. I hope you're as excited to try it as I am to show it off.

Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!

Elasticsearch is packed with new features to help you build the best search solutions for your use case. Dive into our sample notebooks to learn more, start a free cloud trial, or try Elastic on your local machine now.

Ready to build state of the art search experiences?

Sufficiently advanced search isn’t achieved with the efforts of one. Elasticsearch is powered by data scientists, ML ops, engineers, and many more who are just as passionate about search as your are. Let’s connect and work together to build the magical search experience that will get you the results you want.

Try it yourself