- Enterprise Search Guide: other versions:
- Overview
- Getting started
- Web crawler
- Connectors
- Engines and content sources
- Programming language clients
- Search UI
- App Search and Workplace Search
- Enterprise Search server
- Run using Docker images
- Run using downloads (packages)
- Troubleshooting
- Monitoring
- Read-only mode
- Management APIs
- Monitoring APIs
- Read-only mode API
- Configuration
- Configuring encryption keys
- Configuring a mail service
- Configuring SSL/TLS
- Upgrading and migrating
- Upgrading self-managed deployments
- Upgrading from Enterprise Search 7.x
- Upgrading from Enterprise Search 7.11 and earlier
- Migrating from App Search on Elastic Cloud
- Migrating from App Search on Swiftype.com
- Migrating from self-managed App Search
- Logs and logging
- Release notes
- 8.4.3 release notes
- 8.4.2 release notes
- 8.4.1 release notes
- 8.4.0 release notes
- 8.3.3 release notes
- 8.3.2 release notes
- 8.3.1 release notes
- 8.3.0 release notes
- 8.2.3 release notes
- 8.2.2 release notes
- 8.2.1 release notes
- 8.2.0 release notes
- 8.1.3 release notes
- 8.1.2 release notes
- 8.1.1 release notes
- 8.1.0 release notes
- 8.0.1 release notes
- 8.0.0 release notes
- 8.0.0-rc2 release notes
- 8.0.0-rc1 release notes
- 8.0.0-beta1 release notes
- 8.0.0-alpha2 release notes
- 8.0.0-alpha1 release notes
Elastic web crawler
editElastic web crawler
editLooking for the App Search web crawler? See the App Search documentation.
To compare the web crawler with the App Search web crawler, see the reference table on this page.
This feature is not available at all Elastic subscription levels. Refer to the Elastic subscriptions pages for Elastic Cloud and self-managed deployments.
Overview
editUse the web crawler to programmatically discover, extract, and index searchable content from websites and knowledge bases. When you ingest data with the web crawler a search-optimized Elasticsearch index is created to hold and sync webpage content.
The web crawler is a native Elasticsearch solution. It reads and writes directly to Elasticsearch indices in a format that enables developers to build intuitive, relevant search experiences using App Search engines and the Search UI library.
Web crawler documentation:
- Getting started with website search: Concrete guide to building a website search experience, using the crawler UI.
-
Managing crawls: Detailed reference for managing crawls using the Kibana UI. Learn how to:
- Manage duplicated documents
- Extract binary content such as PDFs from webpages
- Schedule automated crawls
-
Optimizing web content: Optimize your web content source files for the web crawler, to manage webpage discovery and content extraction. Learn about:
- Custom fields using proxy: How to extract custom fields from webpages using a proxy server.
- Troubleshooting crawls: Detailed troubleshooting reference
- Web crawler events logs reference: Detailed web crawler events logs reference
-
View web crawler events logs: How to view web crawler events logs in Kibana
Appendix: Compare the web crawler and App Search web crawler
edit
App Search web crawler |
Web crawler |
|
Interface |
GUI / API |
GUI-only |
Binary content extraction |
Yes |
Yes |
Search |
App Search |
Elasticsearch / App Search using Elasticsearch search API for App Search |
Ingest pipelines |
Yes |
Yes |
Monitoring |
Yes |
Yes |
APM |
Yes |
Yes |
Audit logging |
Yes |
No |
Event logging |
Yes |
Yes |
Public REST API |
Yes |
No |