Create automated snapshots
editCreate automated snapshots
editSnapshots are essential for recovering Elasticsearch indices in case of accidental deletion or for migrating data between clusters.
To set up automated snapshots for Elasticsearch on Kubernetes you have to:
Support for S3, GCS and Azure repositories is bundled in Elasticsearch by default from version 8.0. On older versions of Elasticsearch, or if another snapshot repository plugin should be used, you have to Install a snapshot repository plugin.
For more information on Elasticsearch snapshots, check Snapshot and Restore in the Elasticsearch documentation.
Configuration examples
editWhat follows is a non-exhaustive list of configuration examples. The first example might be worth reading even if you are targeting a Cloud provider other than GCP as it covers adding snapshot repository credentials to the Elasticsearch keystore and illustrates the basic workflow of setting up a snapshot repository:
The following examples cover approaches that use Cloud-provider specific means to leverage Kubernetes service accounts to avoid having to configure snapshot repository credentials in Elasticsearch:
The final example illustrates how to configure secure and trusted communication when you
Basic snapshot repository setup using GCS as an example
editConfigure GCS credentials through the Elasticsearch keystore
editThe Elasticsearch GCS repository plugin requires a JSON file that contains service account credentials. These need to be added as secure settings to the Elasticsearch keystore. For more details, check Google Cloud Storage Repository.
Using ECK, you can automatically inject secure settings into a cluster node by providing them through a secret in the Elasticsearch Spec.
-
Create a file containing the GCS credentials. For this example, name it
gcs.client.default.credentials_file
. The file name is important as it is reflected in the secure setting.{ "type": "service_account", "project_id": "your-project-id", "private_key_id": "...", "private_key": "-----BEGIN PRIVATE KEY-----\n...\n-----END PRIVATE KEY-----\n", "client_email": "service-account-for-your-repository@your-project-id.iam.gserviceaccount.com", "client_id": "...", "auth_uri": "https://accounts.google.com/o/oauth2/auth", "token_uri": "https://accounts.google.com/o/oauth2/token", "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs", "client_x509_cert_url": "https://www.googleapis.com/robot/v1/metadata/x509/[email protected]" }
-
Create a Kubernetes secret from that file:
kubectl create secret generic gcs-credentials --from-file=gcs.client.default.credentials_file
-
Edit the
secureSettings
section of the Elasticsearch resource:apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 8.15.3 # Inject secure settings into Elasticsearch nodes from a k8s secret reference secureSettings: - secretName: gcs-credentials
If you haven’t followed these instructions and named your GCS credentials file differently, you can still map it to the expected name now. Check Secure Settings for details.
-
Apply the modifications:
kubectl apply -f elasticsearch.yml
GCS credentials are automatically propagated into each Elasticsearch node’s keystore. It can take up to a few minutes, depending on the number of secrets in the keystore. You don’t have to restart the nodes.
Register the repository in Elasticsearch
edit-
Create the GCS snapshot repository in Elasticsearch. You can either use the Snapshot and Restore UI in Kibana version 7.4.0 or higher, or follow the procedure described in Snapshot and Restore:
PUT /_snapshot/my_gcs_repository { "type": "gcs", "settings": { "bucket": "my_bucket", "client": "default" } }
-
Take a snapshot with the following HTTP request:
PUT /_snapshot/my_gcs_repository/test-snapshot
Use GKE Workload Identity
editGKE Workload Identity allows a Kubernetes service account to impersonate a Google Cloud IAM service account and therefore to configure a snapshot repository in Elasticsearch without storing Google Cloud credentials in Elasticsearch itself. This feature requires your Kubernetes cluster to run on GKE and your Elasticsearch cluster to run at least version 7.13 and version 8.1 when using searchable snapshots.
Follow the instructions in the GKE documentation to configure workload identity, specifically:
-
Create or update your Kubernetes cluster with
--workload-pool=PROJECT_ID.svc.id.goog
enabled, wherePROJECT_ID
is your Google project ID -
Create a namespace and a Kubernetes service account (
test-gcs
andgcs-sa
in this example) -
Create the bucket, the Google service account (
gcp-sa
in this example. Note that both Google and Kubernetes have the concept of a service account and this example is referring to the former) and set the relevant permissions through Google Cloud console or gcloud CLI -
Allow the Kubernetes service account to impersonate the Google service account:
gcloud iam service-accounts add-iam-policy-binding gcp-sa@PROJECT_ID.iam.gserviceaccount.com \ --role roles/iam.workloadIdentityUser \ --member "serviceAccount:PROJECT_ID.svc.id.goog[test-gcs/gcs-sa]"
-
Add the
iam.gke.io/gcp-service-account
annotation on the Kubernetes service accountkubectl annotate serviceaccount gcs-sa \ --namespace test-gcs \ iam.gke.io/gcp-service-account=gcp-sa@PROJECT_ID.iam.gserviceaccount.com
-
Create an Elasticsearch cluster, referencing the Kubernetes service account
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-gcs-sample namespace: test-gcs spec: version: 8.15.3 nodeSets: - name: default podTemplate: spec: automountServiceAccountToken: true serviceAccountName: gcs-sa count: 3
- Create the snapshot repository as described in Register the repository in Elasticsearch
Use AWS IAM roles for service accounts (IRSA)
editThe AWS IAM roles for service accounts feature allows you to give Elasticsearch restricted access to a S3 bucket without having to expose and store AWS credentials directly in Elasticsearch. This requires you to run the ECK operator on Amazon’s EKS offering and an Elasticsearch cluster running at least version 8.1.
Follow the AWS documentation to set this feature up. Specifically you need to:
-
Define an IAM policy file, called
iam-policy.json
in this example, giving access to an S3 bucket calledmy_bucket
{ "Version": "2012-10-17", "Statement": [ { "Sid": "VisualEditor0", "Effect": "Allow", "Action": [ "s3:ListBucketMultipartUploads", "s3:ListBucketVersions", "s3:ListBucket", "s3:GetBucketLocation" ], "Resource": "arn:aws:s3:::my_bucket" }, { "Sid": "VisualEditor1", "Effect": "Allow", "Action": [ "s3:PutObject", "s3:GetObject", "s3:AbortMultipartUpload", "s3:DeleteObject", "s3:ListMultipartUploadParts" ], "Resource": "arn:aws:s3:::my_bucket/*" } ] }
-
Create the policy using AWS CLI tooling, using the name
eck-snapshots
in this exampleaws iam create-policy \ --policy-name eck-snapshots \ --policy-document file://iam-policy.json
-
Use
eksctl
to create an IAM role and create and annotate a Kubernetes service account with it. The service account is calledaws-sa
in thedefault
namespace in this example. -
Create an Elasticsearch cluster referencing the service account
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es spec: version: 8.15.3 nodeSets: - name: default count: 3 podTemplate: spec: serviceAccountName: aws-sa containers: - name: elasticsearch env: - name: AWS_WEB_IDENTITY_TOKEN_FILE value: "/usr/share/elasticsearch/config/repository-s3/aws-web-identity-token-file" - name: AWS_ROLE_ARN value: "arn:aws:iam::YOUR_ROLE_ARN_HERE" volumeMounts: - name: aws-iam-token mountPath: /usr/share/elasticsearch/config/repository-s3 volumes: - name: aws-iam-token projected: sources: - serviceAccountToken: audience: sts.amazonaws.com expirationSeconds: 86400 path: aws-web-identity-token-file
-
Create the snapshot repository as described in Register the repository in Elasticsearch but of type
s3
PUT /_snapshot/my_s3_repository { "type": "s3", "settings": { "bucket": "my_bucket" } }
Use Azure Workload Identity
editStarting with version 8.16 Elasticsearch supports Azure Workload identity which allows the use of Azure blob storage for Elasticsearch snapshots without exposing Azure credentials directly to Elasticsearch.
Follow the Azure documentation for setting up workload identity for the first five steps:
- Create a resource group, if it does not exist yet.
- Create or update your AKS cluster to enable workload identity.
- Retrieve the OIDC issuer URL.
- Create a managed identity and link it to a Kubernetes service account.
-
Create the federated identity credential.
The following steps diverge from the tutorial in the Azure documentation. However, variables initialised as part of the Azure tutorial are still assumed to be present.
-
Create an Azure storage account, if it does not exist yet.
az storage account create \ --name esstorage \ --resource-group "${RESOURCE_GROUP}" \ --location "${LOCATION}" \ --encryption-services blob \ --sku Standard_ZRS
This can be any of the supported storage account types
Standard_LRS
,Standard_ZRS
,Standard_GRS
,Standard_RAGRS
but notPremium_LRS
see the Elasticsearch documentation for details. -
Create a container in the storage account, for this example
es-snapshots
.az storage container create \ --account-name "${STORAGE_ACCOUNT_NAME}" \ --name es-snapshots --auth-mode login
-
Create a role assignment between the managed identity and the storage account.
IDENTITY_PRINCIPAL_ID=$(az identity show \ --name "${USER_ASSIGNED_IDENTITY_NAME}" \ --resource-group "${RESOURCE_GROUP}" \ --query principalId --o tsv) STORAGE_SCOPE=$(az storage account show \ --resource-group "${RESOURCE_GROUP}" \ --name "${STORAGE_ACCOUNT_NAME}" --query id -o tsv | sed 's#/##') az role assignment create \ --assignee-object-id "${IDENTITY_PRINCIPAL_ID}" \ --role "Storage Blob Data Contributor" \ --scope "${STORAGE_SCOPE}"
-
Create a Kubernetes secret, called
keystore
in this example, with the storage account name. This is necessary to be able to specify the account name as a secure setting in Elasticsearch in the next step.kubectl create secret generic keystore \ --from-literal=azure.client.default.account=${STORAGE_ACCOUNT_NAME}
-
Create an Elasticsearch cluster that uses the Kubernetes service account created earlier.
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: az-workload-identity-sample spec: version: 8.16.0 secureSettings: - secretName: keystore nodeSets: - name: default count: 1 podTemplate: metadata: labels: azure.workload.identity/use: "true" spec: serviceAccountName: workload-identity-sa containers: - name: elasticsearch env: - name: AZURE_FEDERATED_TOKEN_FILE value: /usr/share/elasticsearch/config/azure/tokens/azure-identity-token volumeMounts: - name: azure-identity-token mountPath: /usr/share/elasticsearch/config/azure/tokens
Specify the Kubernetes secret created in the previous step to configure the Azure storage account name as a secure setting.
This is the service account created earlier in the steps from the Azure Workload Identity tutorial.
The corresponding volume is injected by the Azure Workload Identity Mutating Admission Webhook. For Elasticsearch to be able to access the token, the mount needs to be in a sub-directory of the Elasticsearch config directory. The corresponding environment variable needs to be adjusted as well.
-
Create a snapshot repository of type
azure
through the Elasticsearch API, or through Elastic Stack configuration policies.POST _snapshot/my_azure_repository { "type": "azure", "settings": { "container": "es-snapshots" } }
Use S3-compatible services
editThe following example assumes that you have deployed and configured a S3 compatible object store like MinIO that can be reached from the Kubernetes cluster, and also that you have created a bucket in said service, called es-repo
in this example. The example also assumes an Elasticsearch cluster named es
is deployed within the cluster.
Most importantly the steps describing how to customize the JVM trust store are only necessary if your S3-compatible service is using TLS certificates that are not issued by a well known certificate authority.
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es spec: version: 8.15.3 nodeSets: - name: mixed count: 3
-
Extract the cacerts JVM trust store from one of the running Elasticsearch nodes.
kubectl cp es-es-mixed-0:/usr/share/elasticsearch/jdk/lib/security/cacerts cacerts
You can skip this step if you want to create a new trust store that does not contain any well known CAs that Elasticsearch trusts by default. Be aware that this limits Elasticsearch’s ability to communicate with TLS secured endpoints to those for which you add CA certificates in the next steps.
-
Obtain the CA certificate used to sign the certificate of your S3-compatible service. We assume it is called
tls.crt
-
Add the certificate to the JVM trust store from step 1
keytool -importcert -keystore cacerts -storepass changeit -file tls.crt -alias my-custom-s3-svc
You need to have the Java Runtime environment with the
keytool
installed locally for this step.changeit
is the default password used by the JVM, but it can be changed withkeytool
as well. -
Create a Kubernetes secret with the amended trust store
kubectl create secret generic custom-truststore --from-file=cacerts
-
Create a Kubernetes secret with the credentials for your object store bucket
kubectl create secret generic snapshot-settings \ --from-literal=s3.client.default.access_key=$YOUR_ACCESS_KEY \ --from-literal=s3.client.default.secret_key=$YOUR_SECRET_ACCESS_KEY
-
Update your Elasticsearch cluster to use the trust store and credentials from the Kubernetes secrets
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: es spec: version: 8.15.3 secureSettings: - secretName: snapshot-settings nodeSets: - name: mixed count: 3 podTemplate: spec: volumes: - name: custom-truststore secret: secretName: additional-certs containers: - name: elasticsearch volumeMounts: - name: custom-truststore mountPath: /usr/share/elasticsearch/config/custom-truststore env: - name: ES_JAVA_OPTS value: "-Djavax.net.ssl.trustStore=/usr/share/elasticsearch/config/custom-truststore/cacerts -Djavax.net.ssl.keyStorePassword=changeit"
-
Create the snapshot repository
Install a snapshot repository plugin
editIf you are running a version of Elasticsearch before 8.0 or you need a snapshot repository plugin that is not already pre-installed you have to install the plugin yourself. To install the snapshot repository plugin, you can either use a custom image or add your own init container which installs the plugin when the Pod is created.
To use your own custom image with all necessary plugins pre-installed, use an Elasticsearch resource like the following:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 8.15.3 image: your/custom/image:tag nodeSets: - name: default count: 1
Alternatively, install the plugin when the Pod is created by using an init container:
apiVersion: elasticsearch.k8s.elastic.co/v1 kind: Elasticsearch metadata: name: elasticsearch-sample spec: version: 8.15.3 nodeSets: - name: default count: 1 podTemplate: spec: initContainers: - name: install-plugins command: - sh - -c - | bin/elasticsearch-plugin remove --purge repository-gcs bin/elasticsearch-plugin install --batch repository-gcs
Assuming you stored this in a file called elasticsearch.yaml
you can in both cases create the Elasticsearch cluster with:
kubectl apply -f elasticsearch.yaml