Export All Grafana dashboards in one minute

2020-12-152023-08-31

Introduction

We currently use ceph as persistent storage for Grafana in our k8s cluster, so once I delete and rebuild Grafana’s Deployment, all previous data will be lost and the rebuilt PV will be mapped to the new location on the backend storage. Fortunately, I did rebuild it by hand and didn’t back it up in advance. Fortunately for me.

After 250 minutes of rebuilding the Dashboard, I was so upset that I was about to say F***.

Usual ways

If this keeps up I’m really going to be fool, how can I stand it? I immediately opened up Google and researched the various Grafana backups and found that most of the backup solutions are shell scripts that call Grafana’s API to export various configurations. Most of the backup scripts are concentrated in this gist

https://gist.github.com/crisidev/bd52bdcc7f029be2f295
I’ve picked out a few of the better ones, but you can pick others if you like.

Export script

#!/bin/bash

# Usage:
#
# export_grafana_dashboards.sh https://admin:[email protected]

create_slug () {
  echo "$1" | iconv -t ascii//TRANSLIT | sed -r s/[^a-zA-Z0-9]+/-/g | sed -r s/^-+\|-+$//g | tr A-Z a-z
}

full_url=$1
username=$(echo "${full_url}" | cut -d/ -f 3 | cut -d: -f 1)
base_url=$(echo "${full_url}" | cut -d@ -f 2)
folder=$(create_slug "${username}-${base_url}")

mkdir "${folder}"
for db_uid in $(curl -s "${full_url}/api/search" | jq -r .[].uid); do
  db_json=$(curl -s "${full_url}/api/dashboards/uid/${db_uid}")
  db_slug=$(echo "${db_json}" | jq -r .meta.slug)
  db_title=$(echo "${db_json}" | jq -r .dashboard.title)
  filename="${folder}/${db_slug}.json"
  echo "Exporting \"${db_title}\" to \"${filename}\"..."
  echo "${db_json}" | jq -r . > "${filename}"
done
echo "Done"

This script is relatively simple, exporting all Dashboard json configurations directly and without tagging directory information. If you use its exported configuration to restore Grafana, all Dashboards will be imported into Grafana’s General directory, which is not very friendly.

Import script

#!/bin/bash
#
# add the "-x" option to the shebang line if you want a more verbose output
#
#
OPTSPEC=":hp:t:k:"

show_help() {
cat << EOF
Usage: $0 [-p PATH] [-t TARGET_HOST] [-k API_KEY]
Script to import dashboards into Grafana
    -p      Required. Root path containing JSON exports of the dashboards you want imported.
    -t      Required. The full URL of the target host
    -k      Required. The API key to use on the target host

    -h      Display this help and exit.
EOF
}

###### Check script invocation options ######
while getopts "$OPTSPEC" optchar; do
    case "$optchar" in
        h)
            show_help
            exit
            ;;
        p)
            DASH_DIR="$OPTARG";;
        t)
            HOST="$OPTARG";;
        k)
            KEY="$OPTARG";;
        \?)
          echo "Invalid option: -$OPTARG" >&2
          exit 1
          ;;
        :)
          echo "Option -$OPTARG requires an argument." >&2
          exit 1
          ;;
    esac
done

if [ -z "$DASH_DIR" ] || [ -z "$HOST" ] || [ -z "$KEY" ]; then
    show_help
    exit 1
fi

# set some colors for status OK, FAIL and titles
SETCOLOR_SUCCESS="echo -en \\033[0;32m"
SETCOLOR_FAILURE="echo -en \\033[1;31m"
SETCOLOR_NORMAL="echo -en \\033[0;39m"
SETCOLOR_TITLE_PURPLE="echo -en \\033[0;35m" # purple

# usage log "string to log" "color option"
function log_success() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       echo "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")

   ${SETCOLOR_SUCCESS}
   printf "[%s] $1\n" "$timestamp"
   ${SETCOLOR_NORMAL}
}

function log_failure() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       echo "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   timestamp=$(date "+%Y-%m-%d %H:%M:%S %Z")

   ${SETCOLOR_FAILURE}
   printf "[%s] $1\n" "$timestamp"
   ${SETCOLOR_NORMAL}
}

function log_title() {
   if [ $# -lt 1 ]; then
       ${SETCOLOR_FAILURE}
       log_failure "Not enough arguments for log function! Expecting 1 argument got $#"
       exit 1
   fi

   ${SETCOLOR_TITLE_PURPLE}
   printf "|-------------------------------------------------------------------------|\n"
   printf "|%s|\n" "$1";
   printf "|-------------------------------------------------------------------------|\n"
   ${SETCOLOR_NORMAL}
}

if [ -d "$DASH_DIR" ]; then
    DASH_LIST=$(find "$DASH_DIR" -mindepth 1 -name \*.json)
    if [ -z "$DASH_LIST" ]; then
        log_title "----------------- $DASH_DIR contains no JSON files! -----------------"
        log_failure "Directory $DASH_DIR does not appear to contain any JSON files for import. Check your path and try again."
        exit 1
    else
        FILESTOTAL=$(echo "$DASH_LIST" | wc -l)
        log_title "----------------- Starting import of $FILESTOTAL dashboards -----------------"
    fi
else
    log_title "----------------- $DASH_DIR directory not found! -----------------"
    log_failure "Directory $DASH_DIR does not exist. Check your path and try again."
    exit 1
fi

NUMSUCCESS=0
NUMFAILURE=0
COUNTER=0

for DASH_FILE in $DASH_LIST; do
    COUNTER=$((COUNTER + 1))
    echo "Import $COUNTER/$FILESTOTAL: $DASH_FILE..."
    RESULT=$(cat "$DASH_FILE" | jq '. * {overwrite: true, dashboard: {id: null}}' | curl -s -X POST -H "Content-Type: application/json" -H "Authorization: Bearer $KEY" "$HOST"/api/dashboards/db -d @-)
    if [[ "$RESULT" == *"success"* ]]; then
        log_success "$RESULT"
        NUMSUCCESS=$((NUMSUCCESS + 1))
    else
        log_failure "$RESULT"
        NUMFAILURE=$((NUMFAILURE + 1))
    fi
done

log_title "Import complete. $NUMSUCCESS dashboards were successfully imported. $NUMFAILURE dashboard imports failed.";
log_title "------------------------------ FINISHED ---------------------------------";

To import the script, Grafana needs to be up and running on the target machine and the Administrator API Key needs to be provided.

Create a new API Key, select Admin for the role and adjust the expiry time yourself.

Import:

$ ./grafana-dashboard-importer.sh -t http://<grafana_svc_ip>:<grafana_svc_port> -k <api_key> -p <backup folder>

The -p parameter specifies the directory where the previously exported json is located.

The pain point of the current solution is that only Dashboard can be backed up, not other configurations (e.g. data sources, users, secret keys, etc.), and that there is no correspondence between Dashboard and directory, i.e. no support for backing up Folders. here is a perfect backup and restore solution that supports all configurations.

Advanced method

A more advanced solution has already been written by someone, and the project address is

https://github.com/ysde/grafana-backup-tool
The backup tool supports the following configurations.

Directory
Dashboard
Data source
Grafana Alert Channel (Alert Channel)
Organization (Organization)
User

It’s easy to use, just run a container, but I’m not very happy with the Dockerfile provided by the author, so I’ve modified it to include

FROM alpine:latest

LABEL maintainer="grafana-backup-tool Docker Maintainers https://fuckcloudnative.io"

ENV ARCHIVE_FILE ""

RUN echo "@edge http://dl-cdn.alpinelinux.org/alpine/edge/community" >> /etc/apk/repositories; \
    apk --no-cache add python3 py3-pip py3-cffi py3-cryptography ca-certificates bash git; \
    git clone https://github.com/ysde/grafana-backup-tool /opt/grafana-backup-tool; \
    cd /opt/grafana-backup-tool; \
    pip3 --no-cache-dir install .; \
    chown -R 1337:1337 /opt/grafana-backup-tool

WORKDIR /opt/grafana-backup-tool

USER 1337

Don’t ask me what I’m using, it’s white whoring GitHub Action of course, and the workflow is as follows.

#=================================================
# https://github.com/yangchuansheng/docker-image
# Description: Build and push grafana-backup-tool Docker image
# Lisence: MIT
# Author: Ryan
# Blog: https://fuckcloudnative.io
#=================================================

name: Build and push grafana-backup-tool Docker image

# Controls when the action will run. Triggers the workflow on push or pull request
# events but only for the master branch
on:
  push:
    branches: [ master ]
    paths: 
      - 'grafana-backup-tool/Dockerfile'
      - '.github/workflows/grafana-backup-tool.yml'
  pull_request:
    branches: [ master ]
    paths: 
      - 'grafana-backup-tool/Dockerfile'
  #watch:
    #types: started

# A workflow run is made up of one or more jobs that can run sequentially or in parallel
jobs:
  # This workflow contains a single job called "build"
  build:
    # The type of runner that the job will run on
    runs-on: ubuntu-latest

    # Steps represent a sequence of tasks that will be executed as part of the job
    steps:
    # Checks-out your repository under $GITHUB_WORKSPACE, so your job can access it
    - uses: actions/checkout@v2

    - name: Set up QEMU
      uses: docker/setup-qemu-action@v1

    - name: Set up Docker Buildx
      uses: docker/setup-buildx-action@v1

    - name: Login to DockerHub
      uses: docker/login-action@v1 
      with:
        username: ${{ secrets.DOCKER_USERNAME }}
        password: ${{ secrets.DOCKER_PASSWORD }}
        
    - name: Login to GitHub Package Registry
      env:
        username: ${{ github.repository_owner }}
        password: ${{ secrets.GHCR_TOKEN }}
      run: echo ${{ env.password }} | docker login ghcr.io -u ${{ env.username }} --password-stdin  

    # Runs a single command using the runners shell
    - name: Build and push Docker images to docker.io and ghcr.io
      uses: docker/build-push-action@v2
      with:
        file: 'grafana-backup-tool/Dockerfile'
        platforms: linux/386,linux/amd64,linux/arm/v6,linux/arm/v7,linux/arm64,linux/ppc64le,linux/s390x
        context: grafana-backup-tool
        push: true
        tags: |
          yangchuansheng/grafana-backup-tool:latest
          ghcr.io/yangchuansheng/grafana-backup-tool:latest

    #- name: Update repo description
      #uses: peter-evans/dockerhub-description@v2
      #env:
        #DOCKERHUB_USERNAME: ${{ secrets.DOCKER_USERNAME }}
        #DOCKERHUB_PASSWORD: ${{ secrets.DOCKER_PASSWORD }}
        #DOCKERHUB_REPOSITORY: yangchuansheng/grafana-backup-tool
        #README_FILEPATH: grafana-backup-tool/readme.md

I’m not going to explain the workflow here, those who have some basic knowledge should be able to understand it, but if not, I’ll write a separate article to explain it later (so I can have one more~). What this workflow does is to automatically build images for each CPU architecture and push them to docker.io and ghcr.io.

Isn’t that good?

You can follow my repo directly at

https://github.com/yangchuansheng/docker-image
The Docker Hub has recently started to limit the number of pulls, and the ghcr.io pulling speed is also very slow, but I paid for an reverse proxy server.

After building the image, you can run the container directly to perform backup and recovery operations. If you want to do it inside the cluster, you can do it via Deployment or Job; if you want to do it locally or outside the k8s cluster, you can choose docker run, which I don’t disagree with, or you can choose docker-compose, which is fine. But I’m going to tell you about an even flashier way to do it, one that can be so flirty you can’t help yourself.

First you need to install Podman locally or outside the cluster. If the OS is Win10, consider installing it via WSL; if the OS is Linux, that’s a no-brainer; if the OS is MacOS, please refer to my previous article: ?Using Podman in macOS.

Once you’ve installed Podman, it’s time to get down to business, so keep your eyes open.

Start by writing a Deployment configuration manifest (what? Deployment? Yes, you heard me right).

grafana-backup-deployment.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-backup
  labels:
    app: grafana-backup
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana-backup
  template:
    metadata:
      labels:
        app: grafana-backup
    spec:
      containers:
      - name: grafana-backup
        image: yangchuansheng/grafana-backup-tool:latest
        imagePullPolicy: IfNotPresent
        command: ["/bin/bash"]
        tty: true
        stdin: true
        env:
        - name: GRAFANA_TOKEN
          value: "eyJr0NkFBeWV1QVpMNjNYWXA3UXNOM2JWMWdZOTB2ZFoiLCJuIjoiYWRtaW4iLCJpZCI6MX0="
        - name: GRAFANA_URL
          value: "http://<grafana_ip>:<grafana_port>"
        - name: GRAFANA_ADMIN_ACCOUNT
          value: "admin"
        - name: GRAFANA_ADMIN_PASSWORD
          value: "admin"
        - name: VERIFY_SSL
          value: "False"
        volumeMounts:
        - mountPath: /opt/grafana-backup-tool
          name: data
      volumes:
      - name: data
        hostPath:
          path: /mnt/manifest/grafana/backup

Modify the environment variables in here to suit your own circumstances, make sure you don’t copy me!

Don’t look confused, let me first explain why you need to prepare this Deployment configuration list, because Podman can run containers directly from this configuration list with the following command.

$ podman play kube grafana-backup-deployment.yaml

The first time I saw this in action I couldn’t help but think, “Fuck, that’s possible? It does work, but then again, Podman just translates it and runs a container, it doesn’t really run Deployment because it doesn’t have a controller, but still, what a treat!

Imagine you can take the configuration list from a k8s cluster and run it locally or on a test machine, instead of having a yaml for the k8s cluster and another yaml for docker-compose, one yaml is all you need.

It’s pretty sad that docker-compose has come to this stage.

If you are a careful reader, you will notice that the above configuration list is a bit strange, and so is the Dockerfile, which does not have CMD or ENTRYPOINT written in it, and the Deployment command is directly set to bash, because during my previous tests I found that the image had a problem with the container it started. It would get stuck in a loop where it would keep backing up after it was done, and keep repeating itself, resulting in a bunch of zip files in the backup directory. I haven’t found a good solution yet, so I’ll just set the container start command to bash and wait until the container is running before entering the container to do the backup.

$ podman pod ls
POD ID        NAME                  STATUS   CREATED        # OF CONTAINERS  INFRA ID
728aec216d66  grafana-backup-pod-0  Running  3 minutes ago  2                92aa0824fe7d

$ podman ps
CONTAINER ID  IMAGE                                      COMMAND    CREATED        STATUS            PORTS   NAMES
b523fa8e4819  yangchuansheng/grafana-backup-tool:latest  /bin/bash  3 minutes ago  Up 3 minutes ago          grafana-backup-pod-0-grafana-backup
92aa0824fe7d  k8s.gcr.io/pause:3.2                                  3 minutes ago  Up 3 minutes ago          728aec216d66-infra

$ podman exec -it grafana-backup-pod-0-grafana-backup bash
bash-5.0$ grafana-backup save
...
...
########################################

backup folders at: _OUTPUT_/folders/202012111556
backup datasources at: _OUTPUT_/datasources/202012111556
backup dashboards at: _OUTPUT_/dashboards/202012111556
backup alert_channels at: _OUTPUT_/alert_channels/202012111556
backup organizations at: _OUTPUT_/organizations/202012111556
backup users at: _OUTPUT_/users/202012111556

created archive at: _OUTPUT_/202012111556.tar.gz

By default all components will be backed up, you can also specify which components to back up:

$ grafana-backup save --components=<folders,dashboards,datasources,alert-channels,organizations,users>

or example, I just want to back up Dashboards and Folders

$ grafana-backup save --components=folders,dashboards

Of course, you can also back them all up and select the components you want to restore when you restore

$ grafana-backup restore --components=folders,dashboards

Now you don’t have to worry about your Dashboard being changed or deleted.

As a final note, Grafana in the Prometheus Operator project has some default Dashboards pre-imported via Provisioning[4], which is not a problem, but the grafana-backup-tool tool does not ignore and skip existing If it encounters an existing configuration during recovery, it will exit with an error. This would have been easily fixed by going to the Grafana web interface and deleting all the Dashboards, but the Dashboard imported via Provisioning cannot be deleted, which is awkward.

Until the author fixes this bug, there are two ways to fix this problem.

The first is to remove all configurations for Provisioning from Grafana Deployment before restoring, which are these.

     volumeMounts:
        - mountPath: /etc/grafana/provisioning/datasources
          name: grafana-datasources
          readOnly: false
        - mountPath: /etc/grafana/provisioning/dashboards
          name: grafana-dashboards
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/apiserver
          name: grafana-dashboard-apiserver
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/cluster-total
          name: grafana-dashboard-cluster-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/controller-manager
          name: grafana-dashboard-controller-manager
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-cluster
          name: grafana-dashboard-k8s-resources-cluster
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-namespace
          name: grafana-dashboard-k8s-resources-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-node
          name: grafana-dashboard-k8s-resources-node
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-pod
          name: grafana-dashboard-k8s-resources-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workload
          name: grafana-dashboard-k8s-resources-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/k8s-resources-workloads-namespace
          name: grafana-dashboard-k8s-resources-workloads-namespace
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/kubelet
          name: grafana-dashboard-kubelet
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-pod
          name: grafana-dashboard-namespace-by-pod
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/namespace-by-workload
          name: grafana-dashboard-namespace-by-workload
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/node-cluster-rsrc-use
          name: grafana-dashboard-node-cluster-rsrc-use
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/node-rsrc-use
          name: grafana-dashboard-node-rsrc-use
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/nodes
          name: grafana-dashboard-nodes
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/persistentvolumesusage
          name: grafana-dashboard-persistentvolumesusage
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/pod-total
          name: grafana-dashboard-pod-total
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus-remote-write
          name: grafana-dashboard-prometheus-remote-write
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/prometheus
          name: grafana-dashboard-prometheus
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/proxy
          name: grafana-dashboard-proxy
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/scheduler
          name: grafana-dashboard-scheduler
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/statefulset
          name: grafana-dashboard-statefulset
          readOnly: false
        - mountPath: /grafana-dashboard-definitions/0/workload-total
          name: grafana-dashboard-workload-total
          readOnly: false
...
...
      volumes:
      - name: grafana-datasources
        secret:
          secretName: grafana-datasources
      - configMap:
          name: grafana-dashboards
        name: grafana-dashboards
      - configMap:
          name: grafana-dashboard-apiserver
        name: grafana-dashboard-apiserver
      - configMap:
          name: grafana-dashboard-cluster-total
        name: grafana-dashboard-cluster-total
      - configMap:
          name: grafana-dashboard-controller-manager
        name: grafana-dashboard-controller-manager
      - configMap:
          name: grafana-dashboard-k8s-resources-cluster
        name: grafana-dashboard-k8s-resources-cluster
      - configMap:
          name: grafana-dashboard-k8s-resources-namespace
        name: grafana-dashboard-k8s-resources-namespace
      - configMap:
          name: grafana-dashboard-k8s-resources-node
        name: grafana-dashboard-k8s-resources-node
      - configMap:
          name: grafana-dashboard-k8s-resources-pod
        name: grafana-dashboard-k8s-resources-pod
      - configMap:
          name: grafana-dashboard-k8s-resources-workload
        name: grafana-dashboard-k8s-resources-workload
      - configMap:
          name: grafana-dashboard-k8s-resources-workloads-namespace
        name: grafana-dashboard-k8s-resources-workloads-namespace
      - configMap:
          name: grafana-dashboard-kubelet
        name: grafana-dashboard-kubelet
      - configMap:
          name: grafana-dashboard-namespace-by-pod
        name: grafana-dashboard-namespace-by-pod
      - configMap:
          name: grafana-dashboard-namespace-by-workload
        name: grafana-dashboard-namespace-by-workload
      - configMap:
          name: grafana-dashboard-node-cluster-rsrc-use
        name: grafana-dashboard-node-cluster-rsrc-use
      - configMap:
          name: grafana-dashboard-node-rsrc-use
        name: grafana-dashboard-node-rsrc-use
      - configMap:
          name: grafana-dashboard-nodes
        name: grafana-dashboard-nodes
      - configMap:
          name: grafana-dashboard-persistentvolumesusage
        name: grafana-dashboard-persistentvolumesusage
      - configMap:
          name: grafana-dashboard-pod-total
        name: grafana-dashboard-pod-total
      - configMap:
          name: grafana-dashboard-prometheus-remote-write
        name: grafana-dashboard-prometheus-remote-write
      - configMap:
          name: grafana-dashboard-prometheus
        name: grafana-dashboard-prometheus
      - configMap:
          name: grafana-dashboard-proxy
        name: grafana-dashboard-proxy
      - configMap:
          name: grafana-dashboard-scheduler
        name: grafana-dashboard-scheduler
      - configMap:
          name: grafana-dashboard-statefulset
        name: grafana-dashboard-statefulset
      - configMap:
          name: grafana-dashboard-workload-total
        name: grafana-dashboard-workload-total

The second option is to remove Grafana from the Prometheus Operator and deploy Grafana without Provisioning yourself via Helm or manifest.

If you do not want to remove the Provisioning configuration nor deploy Grafana yourself, then you will have to use the low-level solution mentioned above.