CONTENT AND DATA INGESTION

Index for success

Elastic provides all the tools you need — out of the box tooling or APIs for building robust, flexible ingest mechanisms for all types of data and content. It’s quick to set up, with plenty of options for enriching, transforming, and manipulating data as you go, so you can focus on building powerful search applications.

The Open Web Crawler is in beta. Learn how to set up crawl and extraction rules and combine it with semantic text search.

Learn more

Get started indexing data using Elasticsearch APIs.

See guide

See the ways you can connect with all types of tools and any kind of data.

View integrations

DATA INGESTION ENGINE

Variety is the spice of ingest

Get complete control over your ingest pipeline with powerful prebuilt, yet fully configurable, data ingestion tools and exposed APIs that let you index and manage data your way.

  • Data extraction

    Discover, extract, index, and sync of all your website content — including PDFs! Use Elastic Open Web Crawler to transform your web pages into searchable data.

  • Data connectors

    Make use of Elastic managed connectors and self-managed connectors to popular productivity tools, plus handy APIs to build connectors for your data sources, too.

  • Ingestion APIs

    Employ convenient indexing endpoints to build custom ingestion pipelines, with popular language clients like JavaScript, Java, and Python.

  • Data pipelines

    Keep data ingestion pipelines and management in place with existing Elasticsearch indices or the Elasticsearch query syntax.

ADD SEARCH TO YOUR WEBSITE

The fastest way to index web content

Configure crawls with flexible APIs the way you'd like. With Elastic's Open Web Crawler you are in control of your crawls.

Video thumbnail

Elasticsearch — the most widely deployed vector database

Copy to try locally in two minutes

curl -fsSL https://elastic.co/start-local | sh
Read docs
OR

Start crawling now!

Set up and deploy a crawler for your web content with a terminal and Elasticsearch.

  • Run Docker image

    Deploy web crawler code on your own infrastructure by running from Source or Docker.

  • Set URL for crawl

    Set one or more URLs you want to crawl.

  • Configure and connect

    Identify and correct any challenges impacting crawl stability, content discovery, and content extraction and indexing.

UNIFIED SEARCH APPLICATIONS

Come one content source, come all

Flexibly and efficiently capture, index, and sync the docs, files, fields, metadata, and other key info in your database or content management system. Use API ingestion, prebuilt connectors, or configurable connector packages to ingest this data into Elastic quickly. Choose which objects to synchronize — and when — with an intuitive UI and simple rules during data ingestion.

  • Azure Blob Storage

    Elastic managed

  • Confluence Cloud & Server

    Elastic managed

  • Dropbox

    Elastic managed

  • GitHub & GitHub Enterprise Server

    Elastic managed

  • Google Cloud Storage

    Elastic managed

  • Google Drive

    Elastic managed

  • Jira Cloud & Server

    Elastic managed

  • Microsoft SQL

    Elastic managed

  • MongoDB

    Elastic managed

  • MySQL

    Elastic managed

  • Network drive

    Elastic managed

  • OneDrive

    Elastic managed

  • Oracle

    Elastic managed

  • PostgreSQL

    Elastic managed

  • S3

    Elastic managed

  • Salesforce

    Elastic managed

  • ServiceNow

    Elastic managed

  • SharePoint Online

    Elastic managed

  • Box

    Self-managed

  • Customized connector

    Connector clients and frameworks

  • Gmail

    Self-managed

  • Outlook

    Self-managed

  • SharePoint Server

    Self-managed

  • Slack

    Self-managed

  • Teams

    Self-managed

  • Zoom

    Self-managed

CONNECT WITH CONFIDENCE

The connective tissue for your search experience

With several secure paths to connecting and syncing content from your critical data sources, you can customize the ingest pipeline for all your tools that require indexing.