One another important topic to discuss, yes backup. Irrespective of any system or environment Backup is mandatory. Like that Kubernetes or any managed Kubernetes, we should take a backup, even if you’re using Infrastructure as Code and all deployments are automated, as it will add additional benefits from taking backups of AKS clusters. You can check some best practices from Microsoft.
Why should take Backup?
Most of the SRE or who all managing the Kubernetes, may experienced the risk beyond taking dump or time consumptions when they manage large database in any Kubernetes platform. Also, some important configuration managements may more crucial. To optimize this or reduce the risk, in AKS we do have couple of options like,
- Take a backup from the Azure managed disk with Azure Backup
- Create a scheduler backup to PVC with backup tool like Velero.
In this post, lets check how to use Velero to take Azure Kubernetes Service backup.
Mean Time To Recover (MTTR)
It is always expected to define the MTTR to bring back any given application or system. Now a days we are mostly automated all our operations like cluster creation and application deployment etc using some pipeline with CI/CD tools like Jenkins, CircleCI, etc. But does the bring the cluster quick? there we have lot to discuss or check. Also, we cannot be sure it will be state as before in rare situation. Also, it takes more time if there was a greater number of services, as mostly we have limited number of parallel executions.
Let’s assume each execution takes ~3-4min and you are expected to deploy 10+ microservices or applications, it may take approximately 30-45min. This number should be okay, 4-5 years back, but considering modern technology growth, this may high number.
With this during any disaster/failure, recreating new infrastructure and re-deploying all components can take time. Depending on the criticality of the incident and the importance of the app, it can feel like an eternity.
To fix this kind of challenges tool like Velero to backup all Kubernetes resources, a cluster can be quickly restored to a certain state, in less time. This tool helps to reduce the recovery time from ~45 minutes to ~15 minutes approximately.
Another advantage on using backup tools is help to backup the data in persistent volume, like said before, if you are running any stateful application on the cluster, it uses persistent volume to store the data.
Velero
Velero(formerly Heptio Ark) is an open source tool for safely backing up and restoring resources in a Kubernetes cluster, performing disaster recovery, and migrating resources and persistent volumes to another Kubernetes cluster.
Velero offers key data protection features, such as scheduled backups, retention schedules, and pre- or post-backup hooks for custom actions. Velero can help protect data stored in persistent volumes and makes your entire Kubernetes cluster more resilient.
Velero Use Cases
Here are some of the things Velero can do:
- Back up your cluster and restore it in case of loss.
- Recover from disaster.
- Copy cluster resources to other clusters.
- Replicate your production environment to create development and testing environments.
- Take a snapshot of your application’s state before upgrading a cluster.
How it will work
Each Velero operation–on-demand backup, scheduled backup, restoration–is a custom resource that is defined with a Kubernetes custom resource definition, or CRD, and stored in etcd. Velero includes controllers that process the CRDs to back up and restore resources. You can back up or restore all objects in your cluster, or you can filter objects by type, namespace, or label.
Data protection is a chief concern for application owners who want to make sure that they can restore a cluster to a known good state, recover from a crashed cluster, or migrate to a new environment. Velero provides those capabilities.
Velero Components and Architecture
Velero contains two main components:
- A server that runs on your cluster
- A command-line utility that runs locally
Velero supports plug-ins to enable it to work with different storage systems and Kubernetes platforms. You can run Velero in clusters on a cloud provider or on premises.
Installation:
CLI:
You can download the CLI from official release page, here let see how to install 1.8.1 version, you can pick any version from release page and install based on your requirement.
# cd /tmp
# wget https://github.com/vmware-tanzu/velero/releases/download/v1.8.1/velero-v1.8.1-linux-amd64.tar.gz
# tar -xvf velero-v1.8.1-linux-amd64.tar.gz
# cd velero-v1.8.1-linux-amd64 && mv velero /usr/local/bin/
# velero help
The Server Side
For the server-side component, there’s two main methods of installation:
- The Velero CLI
- A Helm Chart
Installing Velero
Out of two options, we are going to pick helm and let’s install it with some modification, also we can see how it use it in GitOps in future.
Velero uses an Azure Plugin to interact with Azure. To authenticate, use Service Principals for now. Usually, I prefer using Azure Active Directory Pod Identities but there’s an open issue with Managed Identities. It’s a project that allows pods to authenticate against Azure using Managed Identities. In other words, by using pod identities (managed identities), you won’t need an API secret for Velero to authenticate against Azure. Remember though, managed identities only work on Azure.
Prerequisites:
Before we start, we need following tools should be installed.
- Kubernetes Service up and running, if you don’t have one, please follow the steps with terraform to create it in azure. https://foxutech.com/how-to-create-azure-kubernetes-service-using-terraform/ or your preferable environments.
- Kubectl installed in the VM or machine you are going to manage the AKS.
- Have a kubeconfig file (default location is ~/.kube/config).
- azure-cli
- Helm
Dynamic Resource Group
Azure created the “foxutech-velero” resource group to hold dynamic resources created for my Kubernetes cluster. For example, agent pools, dynamic disks for persistent volumes.
Once it is done next step is to setup a storage account.
Setup Storage Account
Create blob container inside the storage account:
# az storage account create --name mystoragevelero --resource-group myResourceGroup --sku Standard_GRS --encryption-services blob --https-only true --kind BlobStorage --access-tier Cold
# az storage container create -n velero --public-access off --account-name mystoragevelero
Get your subscription and tenant ID:
# az account list --query '[?isDefault].id' -o tsv
# az account list --query '[?isDefault].tenantId' -o tsv
Create a service principal with contributor access:
# export SUBSCRIPTION_ID=XXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXX
# export STORAGE_RESOURCE_GROUP=myResourceGroup
# export MC_RESOURCE_GROUP=foxutech-velero
# az ad sp create-for-rbac \
--name "velero" \
--role "Contributor" \
--query 'password' \
-o tsv \
--scopes /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$STORAGE_RESOURCE_GROUP /subscriptions/$SUBSCRIPTION_ID/resourceGroups/$MC_RESOURCE_GROUP
Save the password that you got while creating the service principal.
Get the app ID for the service principal:
# az ad sp list --display-name "velero" --query '[0].appId' -o tsv
Create a credentials file velero-credentials for Velero, make sure to update the values of subscription id, tenant id, a client id (SP app id), client secret (SP password), and resource group name.
# cat velero-credentials
AZURE_SUBSCRIPTION_ID=XXXX-XXXX-XXX-XXX-XXXX-XXXXXXXX
AZURE_TENANT_ID=XXXX-XXXX-XXX-XXX-XXXX-XXXXXXXX
AZURE_CLIENT_ID=SERVICE_PRINCIPAL_APPID
AZURE_CLIENT_SECRET=SERVICE_PRINCIPAL_PASSWORD
AZURE_RESOURCE_GROUP=foxutech-velero
AZURE_CLOUD_NAME=AzurePublicCloud
Install Velero
Add the repo;
# helm repo add vmware-tanzu https://vmware-tanzu.github.io/helm-charts
Create namespace for velero,
# kubectl create ns velero
Install the chart;
# helm install velero vmware-tanzu/velero --namespace velero \
--set-file credentials.secretContents.cloud=./velero-credentials \
--set configuration.provider=azure \
--set configuration.backupStorageLocation.name=azure \
--set configuration.backupStorageLocation.bucket='velero' \
--set configuration.backupStorageLocation.config.resourceGroup=foxutech \
--set configuration.backupStorageLocation.config.storageAccount=foxtfstate \
--set snapshotsEnabled=true \
--set deployRestic=true \
--set configuration.volumeSnapshotLocation.name=azure \
--set image.repository=velero/velero \
--set image.pullPolicy=Always \
--set initContainers[0].name=velero-plugin-for-microsoft-azure \
--set initContainers[0].image=velero/velero-plugin-for-microsoft-azure:master \
--set initContainers[0].volumeMounts[0].mountPath=/target \
--set initContainers[0].volumeMounts[0].name=plugins
Once you are done with the configuration, now it is time to take up the backup and snapshots.
Velero by default takes the snapshots of all the persistent volumes mounted in a particular namespace.
Backup and Snapshot
Check and get the backup location,
# velero backup-location get
NAME PROVIDER BUCKET/PREFIX PHASE LAST VALIDATED ACCESS MODE DEFAULT
azure azure velero Available 2022-05-27 17:11:01 +0000 UTC ReadWrite
let’s install some sample application to test the backup and restore.
Install WordPress:
Create new namespace,
# kubectl create ns wordpress
namespace/wp created
# helm repo add bitnami https://charts.bitnami.com/bitnami
"bitnami" has been added to your repositories
# helm install test-app bitnami/wordpress --namespace wordpress
NAME: test-app
LAST DEPLOYED: Fri May 27 16:22:08 2022
NAMESPACE: wordpress
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
CHART NAME: wordpress
CHART VERSION: 14.3.1
APP VERSION: 5.9.3
** Please be patient while the chart is being deployed **
Your WordPress site can be accessed through the following DNS name from within your cluster:
test-app-wordpress.wp.svc.cluster.local (port 80)
To access your WordPress site from outside the cluster follow the steps below:
1. Get the WordPress URL by running these commands:
NOTE: It may take a few minutes for the LoadBalancer IP to be available.
Watch the status with: 'kubectl get svc --namespace wp -w test-app-wordpress'
export SERVICE_IP=$(kubectl get svc --namespace wp test-app-wordpress --include "{{ range (index .status.loadBalancer.ingress 0) }}{{ . }}{{ end }}")
echo "WordPress URL: http://$SERVICE_IP/"
echo "WordPress Admin URL: http://$SERVICE_IP/admin"
2. Open a browser and access WordPress using the obtained URL.
3. Login with the following credentials below to see your blog:
echo Username: user
echo Password: $(kubectl get secret --namespace wp test-app-wordpress -o jsonpath="{.data.wordpress-password}" | base64 --decode)
now wordpress is up and running, you can either port-forward or loadbalancer and check it.
Now we are all set to take the backup, running following commands to take the backup.
# velero backup create wp-backup --include-namespaces wordpress --storage-location azure --wait
Backup request "wp-backup" submitted successfully.
Waiting for backup to complete. You may safely press ctrl-c to stop waiting - your backup will continue in the background.
....
Backup completed with status: Completed. You may check for more information using the commands `velero backup describe wp-backup` and `velero backup logs wp-backup`.
# velero backup describe wp-backup
Name: wp-backup
Namespace: velero
Labels: velero.io/storage-location=azure
Annotations: velero.io/source-cluster-k8s-gitversion=v1.22.6
velero.io/source-cluster-k8s-major-version=1
velero.io/source-cluster-k8s-minor-version=22
Phase: Completed
Errors: 0
Warnings: 0
Namespaces:
Included: wordpress
Excluded: <none>
Resources:
Included: *
Excluded: <none>
Cluster-scoped: auto
Label selector: <none>
Storage Location: azure
Velero-Native Snapshot PVs: auto
TTL: 720h0m0s
Hooks: <none>
Backup Format Version: 1.1.0
Started: 2022-05-27 16:23:29 +0000 UTC
Completed: 2022-05-27 16:23:33 +0000 UTC
Expiration: 2022-06-26 16:23:29 +0000 UTC
Total items to be backed up: 52
Items backed up: 52
Velero-Native Snapshots: <none included>
Once you see the backup completed, you can now go to your storage account and check the backup objects.
As we seen the backup files are present, now lets delete the resource and restore it.
# kubectl delete ns workpress
Once it deleted, just check all the resources are got deleted by checking the pods and volumes.
# kubectl get po -n workpress
No resources found.
# kubectl get pv -A
Once confirm, now let’s restore the backup,
# velero restore create --from-backup wp-backup
Restore request "wp-backup-20220527162735" submitted successfully.
Run `velero restore describe wp-backup-20220527162735` or `velero restore logs wp-backup-20220527162735` for more details.
# velero restore describe wp-backup-20220527162735
Name: wp-backup-20220527162735
Namespace: velero
Labels: <none>
Annotations: <none>
Phase: Completed
Total items to be restored: 26
Items restored: 26
Started: 2022-05-27 16:27:36 +0000 UTC
Completed: 2022-05-27 16:27:38 +0000 UTC
Backup: wp-backup
Namespaces:
Included: all namespaces found in the backup
Excluded: <none>
Resources:
Included: *
Excluded: nodes, events, events.events.k8s.io, backups.velero.io, restores.velero.io, resticrepositories.velero.io
Cluster-scoped: auto
Namespace mappings: <none>
Label selector: <none>
Restore PVs: auto
Preserve Service NodePorts: auto
# kubectl get po -n wp
NAME READY STATUS RESTARTS AGE
test-app-mariadb-0 1/1 Running 0 74s
test-app-wordpress-76ccf4865b-8kfs6 1/1 Running 0 74s
As it works, now we can go ahead and create a schedule.
#### Setting up the schedule “Back up my cluster every day at 4 am”
# velero schedule create every-day-at-4 --schedule "0 4 * * *"
Note: again, you might run into this issue and if so then you’ll have to exclude the webhook admission configuration.
# velero schedule create every-day-at-7 --schedule "0 7 * * *" --exclude-resources MutatingWebhookConfiguration.admissionregistration.k8s.io
That’s it for now, we will see more examples in coming post with GitOps integration or other use cases.
Learn Kubernetes on Udemy, now Deal extended: Courses Up To 85% Off