Cohere builds large language models and makes them accessible through a set of APIs. Cohere’s 

, transform text chunks into vector representations. These models can be accessed through their 

. This API features an embedding_types parameter which gives users the option to produce highly compressed embeddings to save on storage costs.

, receive user instructions and generate useful text. These models can be accessed through their 

, enabling users to create multi-turn conversational experiences. This API features a documents parameter which allows users to provide the model with their own documents directly in the message; these can be used to ground model outputs.

, improve search results by re-organizing retrieved results based on certain parameters. These models can be accessed through their 

. These models offer a “low lift, last mile” improvement to search algorithms. Together, these models can be used to build state-of-the-art retrieval-augmented generation (RAG) systems - transform your text into embeddings with Embed v3, store them with Elasticsearch, rerank retrieved results for maximum relevancy, and dynamically pass retrieved documents to the Chat API for grounded conversation.

Semantic Search using the Inference API with the Cohere service

Configure the chatbot application to explore building RAG with Elasticsearch, Cohere and LangChain

Cohere builds models for use cases such as embeddings and reranking, used by companies to build production-ready, scalable, and efficient AI-powered applications.

Cohere

 is an open-source hub dedicated to AI/ML models and tools, offering access to a vast collection of machine learning models. This platform makes it easy to incorporate specialized AI and ML functionalities into your applications.

You can use Hugging Face models with Elasticsearch in two ways:

Leverage the Transformers Python library to carry out inference within a Python backend environment.

Deploy Hugging Face models directly in Elasticsearch to perform inference within the Elasticsearch environment.

Building RAG with Gemma, Hugging Face and Elasticsearch

Tutorials for building semantic search with Hugging Face and Elastic

Use Elasticsearch open inference API integration with Hugging Face to build semantic search

Open-source hub dedicated to AI/ML models and tools. Seamlessly integrate Hugging Face and Elasticsearch by leveraging the transformers library or deploying available models directly in Elasticsearch.

Hugging Face

Elasticsearch is commonly used with OpenAIs APIs in two ways:

Using OpenAI's embedding models, such as 

Using OpenAI's Chat completions API to help answer questions given a context of documents retrieved from Elasticsearch

Semantic search and RAG using Elasticsearch and OpenAI

Semantic Search using the Inference API with the OpenAI service

Configure the chatbot application to explore building RAG with Elasticsearch, OpenAI and LangChain

Integrate OpenAI with Elastic by leveraging OpenAI's embedding models, the chat completions API or other advanced integration options.

OpenAI

 is a popular framework for working with AI, Vectors, and embeddings. Used to simplify building a variety of AI applications.

Elasticsearch can be used with LangChain in three ways:

 to store and retrieve documents from Elasticsearch.

, with the help of an LLM like OpenAI, to transform a user's query into a query + filter to retrieve relevant documents from Elasticsearch.

 for the most flexible way to retrieve documents from Elasticsearch.

Blogs to get started with Elasticsearch and LangChain

Elasticsearch and LangChain: unlocking LLMs

Privacy first AI search with Elasticsearch and LangChain

Question Answering with LangChain and Elasticsearch

Self Query Retriever for Question Answering

This reference app demonstrates how to use LangChain to power a RAG (Retrieval Augmented Generation) model. The app uses the 

 to store and retrieve documents from Elasticsearch. This is a quick way to get started with Langchain and Elasticsearch.

https://github.com/elastic/elasticsearch-labs/tree/main/example-apps/chatbot-rag-app

Explore LangChain’s synergy with Elastic and learn how to leverage the LangChain ElasticsearchStore, use the LangChain self-query retriever, apply the LangChain ElasticsearchRetriever, and more.

LangChain

LlamaIndex is the leading framework for creating apps by connecting your data to LLMs, known as context-augmented applications. These applications range from retrieval-augmented generation or "RAG" systems through structured data extraction to complex, semi-autonomous agent systems that retrieve data and take actions. LlamaIndex provides simple, flexible abstractions to more easily ingest, structure, and access private or domain-specific data in order to inject these safely and reliably into LLMs for more accurate text generation. It's available in 

. You can use LlamaIndex with Elastic in six ways:

 you can source documents from your Elasticsearch database to be used in your app

 can encode your data as vectors for semantic search

 will let you perform semantic searches on your embedded documents

 to create more advanced retrieval structures such as a 

RAG with Elasticsearch, LlamaIndex and Mistral

Use LlamaIndex with Elastic as a data source, embedding model, vector store, index store, KV store, or document store.

LlamaIndex

Vertex AI offers a diverse suite of generative AI models through various APIs, enabling you to build intelligent applications for a wide range of use cases. These models, powered by Google's advanced research, empower you to generate text, translate languages, write different kinds of creative content, and answer your questions in an informative way.

 are designed for multimodal applications. Gemini models accept prompts that include, for example, text and images, and then return a text response. Gemini also supports function calling, which lets developers pass a description of a function and then the model returns a function and parameters that best matches the description. Developers can then call that function in external APIs and services.

: This advanced model boasts a large context window, handling up to 1 million tokens and allowing for nuanced understanding of complex prompts and generation of comprehensive responses.

: These models are perfect for natural language tasks, multi-turn conversations, and code generation. They also offer the ability to incorporate images, PDFs, and videos into your prompts, making them versatile for multimodal applications.

: As Google's most capable multimodal models, these are optimized for intricate tasks involving instruction understanding, code generation, and reasoning. They offer support for multiple languages and are currently available to a select group of customers.

Embeddings for Text (textembedding-gecko) is the name for the model that supports 

. Text embeddings are a NLP technique that converts textual data into numerical vectors that can be processed by machine learning algorithms, especially large models. These vector representations are designed to capture the semantic meaning and context of the words they represent.

There are a few versions available for embeddings. textembedding-gecko@003 is the latest stable embedding model with enhanced AI quality, and textembedding-gecko-multilingual@001 is a model optimized for a wide range of non-English languages.

 generates dimension vectors (128, 256, 512, or 1408 dimensions) based on the input you provide. This input which can include any combination of text, image, or video. The embedding vectors can then be used for other subsequent tasks like image classification or content moderation.

The text, image, and video embedding vectors are in the same semantic space with the same dimensionality. Therefore, these vectors can be used interchangeably for use cases like searching images by text, or searching video by image.

Vertex AI integration with the Elasticsearch Open Inference API for reranking

Iterate and create RAG applications in minutes with Gemini

Power of your data in Elasticsearch with Vertex AI 


Vector Search using Gemini Embeddings and Elasticsearch

Question Answering using Gemini, Langchain and Elasticsearch

Configure the chatbot application to use Google Vertex to explore building RAG with Elasticsearch, Gemini and LangChain

Google’s Vertex AI is a fully-managed, unified AI development platform for building and using generative AI. Vertex AI provides access Vertex AI Studio, Agent Builder, and 150+ foundation models, including Gemini 1.5 Pro and Gemini 1.5 Flash.

Google Vertex AI

Amazon Bedrock is a fully managed service that makes leading foundation models available through an API along with a broad set of capabilities to quickly build and scale generative AI applications.

RAG using Cohere's Command model through Amazon Bedrock

Amazon Bedrock with Elasticsearch and LangChain

Open inference API support for Amazon Bedrock

Build a RAG application with semantic_text and Amazon Bedrock

Using playground to experiment and build with Amazon Bedrock hosted models in minutes

Configure the chatbot application to use Amazon Bedrock to explore building RAG with Elasticsearch, Claude and LangChain

Amazon SageMaker JumpStart is a machine learning (ML) hub that can help you accelerate your ML journey. With SageMaker JumpStart, you can evaluate, compare, and select FMs quickly based on pre-defined quality and responsibility metrics to perform tasks like article summarization and image generation.

Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.

Amazon Bedrock

Elasticsearch is commonly used with Mistral APIs in two ways:

Using Mistral's Chat completions API with Mistral's open-source models (Mistral 7B, Mixtral 8x7B, Mixtral 8x22B) and optimized commercial models (Mistral Small, Mistral Medium, Mistral Large, and Mistral Embeddings) to help answer questions given a context of documents retrieved from Elasticsearch

RAG with LlamaIndex, Elasticsearch and Mistral

Configure the chatbot application to explore building RAG with Elasticsearch, Mistral and LangChain

Mistral AI provides open, efficient, helpful and trustworthy AI models through ground-breaking innovations.

Mistral

Azure OpenAI Service provides REST API access to OpenAI's powerful language models including the GPT-4, GPT-4 Turbo with Vision, GPT-3.5-Turbo, and Embeddings model series. In addition, the new GPT-4 and GPT-3.5-Turbo model series have now reached general availability. These models can be easily adapted to your specific task including but not limited to content generation, summarization, image understanding, semantic search, and natural language to code translation. Users can access the service through REST APIs, Python SDK, or our web-based interface in the Azure OpenAI Studio. 

Bring your Elasticsearch vector database to Azure OpenAI

See how Relativity uses Elasticsearch and Azure OpenAI to build a search powered by AI

Store your Azure OpenAI embeddings in an optimized Elasticsearch vector database, now available on Elastic Cloud

Elasticsearch open inference API support for Azure OpenAI embeddings

Elasticsearch open inference API support for chat completions

Build semantic search with Elasticsearch and Azure OpenAI service

The model catalog in Azure AI studio is the hub to discover and use a wide range of models that enable you to build Generative AI applications. The model catalog features hundreds of models across model providers such as Azure OpenAI service, Mistral, Meta, Cohere, Nvidia, Hugging Face, including models trained by Microsoft.

Elasticsearch open inference API support for Azure AI Studio hosted models

Microsoft’s Azure OpenAI Service provides REST API access to OpenAI's powerful language models, which can be easily adapted to your specific tasks. The model catalog in Azure AI Studio features hundreds of models across model providers such as Azure OpenAI service, Mistral, Meta, Cohere, Nvidia, and Hugging Face, including models trained by Microsoft

Microsoft Azure AI 

Microsoft Azure AI Services

) is the digital technology and intelligence backbone of Alibaba Group. It offers a complete suite of cloud services to customers worldwide, including elastic computing, database, storage, network virtualization services, large-scale computing, security, big data analytics, machine learning and artificial intelligence (AI) services. Alibaba has been named the leading IaaS provider in Asia Pacific by revenue in U.S. dollars since 2018, according to Gartner. It has also maintained its position as one of the world’s leading public cloud IaaS service providers since 2018, according to IDC.

Elasticsearch open Inference API adds support for AlibabaCloud AI Search

Alibaba Cloud Unleashes New AI Search Solution with Elasticsearch 8.9 Release

Build semantic search with Elasticsearch and Alibaba Cloud

Alibaba Cloud

Red Hat provides enterprise open-source software solutions, using a community-powered approach to deliver reliable and high-performing Linux, hybrid cloud, container, and Kubernetes technologies. 

 is a trusted, comprehensive, and consistent platform to develop, modernize, and deploy applications at scale, including today's AI-enabled apps. 

 is a flexible, scalable AI and ML platform that enables enterprises to create and deliver AI-enabled applications at scale across hybrid cloud environments.

Demystify RAG with OpenShift AI and Elasticsearch

Red Hat & Elastic: Red Hat OpenShift AI integration with Elasticsearch

Red Hat Openshift AI is a flexible, scalable AI and ML platform that enables enterprises to create and deliver AI-enabled applications at scale across hybrid cloud environments.

Red Hat

Red Hat Openshift

Vectorize is a data integration platform that transforms unstructured data into optimized vector indexes to power generative AI applications. Through a sophisticated evaluation engine, Vectorize identifies the most effective vectorization strategy for your data. It then establishes real-time vector pipelines, populating vector indexes in your Elastic environment, specifically designed for Retrieval Augmented Generation (RAG).

Leverage Vectorize to quickly determine which embedding models and chunking strategies will give you the best RAG performance once you ingest your data into Elasticsearch.

Use Vectorize to ingest unstructured data from across your organization and continuously synchronize your Elasticsearch vector indexes, ensuring they remain optimized for RAG and are always up to date.

Get started with Vectorize and Elasticsearch

Create a RAG Pipeline with Elastic and AWS S3

Create a RAG Pipeline with Elastic and a Web Crawler

Use Vectorize to optimize your RAG workflows with Elasticsearch as the vector database.

Vectorize.io

 includes core components and AI assistants designed to scale and accelerate AI's impact using trusted data. The platform offers a range of machine learning (ML) models, including those for text embeddings and text completion.

IBM partners with Elasticsearch to deliver Conversational Search with watsonx Assistant

Use Watsonx Slate models in semantic_text, available in 8.16 and later releases! 

Customers can use IBM watsonx Assistant’s new Conversational Search feature and IBM watsonx Discovery today. Visit 

 to learn more about this new capability using Elasticsearch. You can follow 

 for setup and integration with IBM watsonx Assistants.

IBM watsonx

Anthropic is an AI lab whose research and products put safety at the frontier. As a public benefit corporation, Anthropic is dedicated to ensuring the world safely makes the transition through transformative AI. Their multidisciplinary team creates reliable, interpretable, and steerable AI systems. Anthropic’s flagship product is Claude, a family of foundational AI models designed for enterprise applications. Learn more about Anthropic at 

Anthropic

 develops and distributes compact, high-performance Search Foundation models for multilingual text and image processing.

 constructs semantic embedding vectors for texts in 89 languages and allows users to select the output embedding size from a maximum of 1024 dimensions down to 64. These compressed embeddings are nearly equal to the full embeddings in performance, but they are dramatically smaller and our experiments show they speed up search applications proportionately.

This model also provides task-specialized embeddings for classification, clustering, and semantic similarity, while supporting asymmetric retrieval by letting users encode search queries and targets in different, optimized ways.

 provides multimodal embeddings for texts and images, enabling cross-modal applications like image search, but also brings image support to typical text AI applications like semantic similarity, classification, and clustering. By producing the same embedding vectors for texts and images, it acts as a drop-in for any vector-based application framework supporting text.

provides high-quality reanalysis of semantic matches, especially in text-based information retrieval, compensating for the limitations of semantic embedding vectors by analyzing the semantics of texts more closely. Rerankers are typically used in retrieval applications to enhance precision. 

Together with Elasticsearch, Jina AI's search foundation models create powerful search applications - encode your text and images as compact embeddings, store them efficiently in Elasticsearch, and enhance result relevance through reranking, enabling high-performance multilingual and multimodal search at scale.

Using Jina AI’s web API, you have one million free tokens to try the models out. The pages below have instructions for accessing and using Jina AI models.

Late chunking in Elasticsearch with Jina Embeddings v2

Jina AI 

Unstructured provides tools to ingest and preprocess unstructured documents for Retrieval Augmented Generation (RAG) and Fine Tuning.

Unstructured offers the following products:

 - The quickest way to get started for document transformation.

 - Entirely no code enterprise platform to get all your data RAG-ready.

Search complex documents using Unstructured.io and Elasticsearch 

Unstructured.io

 is an AI-powered document parsing and ETL system for complex, unstructured data like PDFs, HTML, presentations, and more. It can process 30+ file formats and extract tables, images, and more with high quality. You can use Aryn to chunk documents, extract metadata, create vector embeddings, and load your Elasticsearch vector and keyword indexes with high-quality data.

Aryn’s document ETL system has two components:

 is a service for segmenting and labeling documents, running optical character recognition (OCR), and extracting tables and images. It can return the structured output of each document in JSON or Markdown and provides labeled bounding boxes for titles, tables, table rows, and columns, images, and regular text. DocParse can process over 30 types of document formats, including PDFs, Microsoft Word, Microsoft PowerPoint, text, and more. It leverages the Aryn Partitioner and its state-of-the-art, open-source 

 trained on 80k+ enterprise documents. DocParse can be used in document ETL pipelines for GenAI apps or just for table extraction and document processing workflows (like in 

 is a tool for creating document ETL pipelines for processing complex, unstructured data and loading it into vector databases and hybrid search engines like Elasticsearch. The pipelines use DocParse for document partitioning and generate ETL pipeline code in Python using the open-source, scalable 

. DocPrep and Sycamore provide scalable and reliable loading of Elasticsearch indices, and include a variety of powerful data transforms that leverage LLMs, including entity extraction, data cleaning, semantic operations, vector embeddings generation, and data enrichment.

High Quality RAG with Aryn DocPrep, DocParse and Elasticsearch vector database

Aryn

Aryn.ai

 is an open-source library for GPU-accelerated vector search and data clustering that enables faster vector searches and index builds.

It supports scalable data analysis, enhances semantic search efficiency, and helps developers accelerate existing systems or compose new ones from the ground up.

Integrated with key libraries and databases, cuVS manages complex code updates as new NVIDIA architectures and NVIDIA® CUDA® versions are released, ensuring peak performance and seamless scalability.

NVIDIA NIM™ provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, RTX™ AI PCs and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA® TensorRT™ and TensorRT-LLM, NIM microservices optimize response latency and throughput for each combination of foundation model and GPU.

Exploring GPU-accelerated Vector Search in Elasticsearch with NVIDIA

NVIDIA NIM with Elasticsearch vector database

Bring Massive-Scale Vector Search to the GPU with Apache Lucene.

 is an open-source library for GPU-accelerated vector search and data clustering that enables faster vector searches and index builds.

NVIDIA NIM™ provides containers to self-host GPU-accelerated inferencing microservices for pretrained and customized AI models across clouds, data centers, RTX™ AI PCs and workstations. NIM microservices expose industry-standard APIs for simple integration into AI applications, development frameworks, and workflows. Built on pre-optimized inference engines from NVIDIA and the community, including NVIDIA® TensorRT™ and TensorRT-LLM, NIM microservices optimize response latency and throughput for each combination of foundation model and GPU.

Hugging Face

Transformers library

Hosted models in Elasticsearch

Get started with these blogs

Tutorials for building semantic search with Hugging Face and Elastic

Ready to build state of the art search experiences?