- Elasticsearch for Apache Hadoop and Spark: other versions:
- Preface
- Elasticsearch for Apache Hadoop
- Documentation sections
- Key features
- Requirements
- Installation
- Architecture
- Configuration
- Runtime options
- Security
- Logging
- Map/Reduce integration
- Apache Hive integration
- Apache Spark support
- Mapping and Types
- Error Handlers
- Kerberos
- Hadoop Metrics
- Performance considerations
- Cloud/restricted environments
- Troubleshooting
- Resources
- License
- Breaking Changes
- Release Notes
- Elasticsearch for Apache Hadoop version 8.17.2
- Elasticsearch for Apache Hadoop version 8.17.1
- Elasticsearch for Apache Hadoop version 8.17.0
- Elasticsearch for Apache Hadoop version 8.16.4
- Elasticsearch for Apache Hadoop version 8.16.3
- Elasticsearch for Apache Hadoop version 8.16.2
- Elasticsearch for Apache Hadoop version 8.16.1
- Elasticsearch for Apache Hadoop version 8.16.0
- Elasticsearch for Apache Hadoop version 8.15.5
- Elasticsearch for Apache Hadoop version 8.15.4
- Elasticsearch for Apache Hadoop version 8.15.3
- Elasticsearch for Apache Hadoop version 8.15.1
- Elasticsearch for Apache Hadoop version 8.15.0
- Elasticsearch for Apache Hadoop version 8.14.3
- Elasticsearch for Apache Hadoop version 8.14.2
- Elasticsearch for Apache Hadoop version 8.14.1
- Elasticsearch for Apache Hadoop version 8.14.0
- Elasticsearch for Apache Hadoop version 8.13.4
- Elasticsearch for Apache Hadoop version 8.13.3
- Elasticsearch for Apache Hadoop version 8.13.2
- Elasticsearch for Apache Hadoop version 8.13.1
- Elasticsearch for Apache Hadoop version 8.13.0
- Elasticsearch for Apache Hadoop version 8.12.2
- Elasticsearch for Apache Hadoop version 8.12.1
- Elasticsearch for Apache Hadoop version 8.12.0
- Elasticsearch for Apache Hadoop version 8.11.4
- Elasticsearch for Apache Hadoop version 8.11.3
- Elasticsearch for Apache Hadoop version 8.11.2
- Elasticsearch for Apache Hadoop version 8.11.1
- Elasticsearch for Apache Hadoop version 8.11.0
- Elasticsearch for Apache Hadoop version 8.10.4
- Elasticsearch for Apache Hadoop version 8.10.3
- Elasticsearch for Apache Hadoop version 8.10.2
- Elasticsearch for Apache Hadoop version 8.10.1
- Elasticsearch for Apache Hadoop version 8.10.0
- Elasticsearch for Apache Hadoop version 8.9.2
- Elasticsearch for Apache Hadoop version 8.9.1
- Elasticsearch for Apache Hadoop version 8.9.0
- Elasticsearch for Apache Hadoop version 8.8.2
- Elasticsearch for Apache Hadoop version 8.8.1
- Elasticsearch for Apache Hadoop version 8.8.0
- Elasticsearch for Apache Hadoop version 8.7.1
- Elasticsearch for Apache Hadoop version 8.7.0
- Elasticsearch for Apache Hadoop version 8.6.2
- Elasticsearch for Apache Hadoop version 8.6.1
- Elasticsearch for Apache Hadoop version 8.6.0
- Elasticsearch for Apache Hadoop version 8.5.3
- Elasticsearch for Apache Hadoop version 8.5.2
- Elasticsearch for Apache Hadoop version 8.5.1
- Elasticsearch for Apache Hadoop version 8.5.0
- Elasticsearch for Apache Hadoop version 8.4.3
- Elasticsearch for Apache Hadoop version 8.4.2
- Elasticsearch for Apache Hadoop version 8.4.1
- Elasticsearch for Apache Hadoop version 8.4.0
- Elasticsearch for Apache Hadoop version 8.3.3
- Elasticsearch for Apache Hadoop version 8.3.2
- Elasticsearch for Apache Hadoop version 8.3.1
- Elasticsearch for Apache Hadoop version 8.3.0
- Elasticsearch for Apache Hadoop version 8.2.3
- Elasticsearch for Apache Hadoop version 8.2.2
- Elasticsearch for Apache Hadoop version 8.2.1
- Elasticsearch for Apache Hadoop version 8.2.0
- Elasticsearch for Apache Hadoop version 8.1.3
- Elasticsearch for Apache Hadoop version 8.1.2
- Elasticsearch for Apache Hadoop version 8.1.1
- Elasticsearch for Apache Hadoop version 8.1.0
- Elasticsearch for Apache Hadoop version 8.0.1
- Elasticsearch for Apache Hadoop version 8.0.0
- Elasticsearch for Apache Hadoop version 8.0.0-rc2
- Elasticsearch for Apache Hadoop version 8.0.0-rc1
- Elasticsearch for Apache Hadoop version 8.0.0-beta1
- Elasticsearch for Apache Hadoop version 8.0.0-alpha2
- Elasticsearch for Apache Hadoop version 8.0.0-alpha1
Key features
editKey features
editThe key features of Elasticsearch for Apache Hadoop include:
- Scalable Map/Reduce model
- elasticsearch-hadoop is built around Map/Reduce: every operation done in elasticsearch-hadoop results in multiple Hadoop tasks (based on the number of target shards) that interact, in parallel with Elasticsearch.
- REST based
- elasticsearch-hadoop uses Elasticsearch REST interface for communication, allowing for flexible deployments by minimizing the number of ports needed to be open within a network.
- Self contained
- the library has been designed to be small and efficient. At around 300KB and no extra dependencies outside Hadoop itself, distributing elasticsearch-hadoop within your cluster is simple and fast.
- Universal jar
- whether you are using vanilla Apache Hadoop or a certain distro, the same elasticsearch-hadoop jar works transparently across all of them.
- Memory and I/O efficient
- elasticsearch-hadoop is focused on performance. From pull-based parsing, to bulk updates and direct conversion to/of native types, elasticsearch-hadoop keeps its memory and network I/O usage finely-tuned.
- Adaptive I/O
- elasticsearch-hadoop detects transport errors and retries automatically. If the Elasticsearch node died, re-routes the request to the available nodes (which are discovered automatically). Additionally, if Elasticsearch is overloaded, elasticsearch-hadoop detects the data rejected and resents it, until it is either processed or the user-defined policy applies.
- Facilitates data co-location
- elasticsearch-hadoop fully integrates with Hadoop exposing its network access information, allowing co-located Elasticsearch and Hadoop clusters to be aware of each other and reduce network IO.
- Map/Reduce API support
- At its core, elasticsearch-hadoop uses the low-level Map/Reduce API to read and write data to Elasticsearch allowing for maximum integration flexibility and performance.
-
old(
mapred
) & new(mapreduce
) Map/Reduce APIs supported -
elasticsearch-hadoop automatically adjusts to your environment; one does not have to change between using the
mapred
ormapreduce
APIs - both are supported, by the same classes, at the same time. - Apache Hive support
- Run Hive queries against Elasticsearch for advanced analystics and real_time responses. elasticsearch-hadoop exposes Elasticsearch as a Hive table so your scripts can crunch through data faster then ever.
- Apache Spark
-
Run fast transformations directly against Elasticsearch, either by streaming data or indexing arbitrary
RDD
s. Available in both Java and Scala flavors.
Was this helpful?
Thank you for your feedback.