Data loss can occur while using a database for various reasons, such as operator error, malicious hacker attacks, and server hardware failures. Backup and recovery technology is the final line of defense to ensure data can still be restored and used after such losses.
TiDB, as a native distributed database, fully supports various backup and recovery capabilities. However, due to its unique architecture, the principles of its backup and recovery processes differ from those of traditional databases.
This article gives a comprehensive overview of TiDB’s backup and recovery capabilities.
TiDB’s Backup Capabilities
TiDB offers two types of backups: physical backups and logical backups.
Physical backups involve directly backing up physical files (.SST) and can be divided into full and incremental backups. Logical backups export data to binary or text files. Physical backups are typically used for cluster-level or database-level backups involving large amounts of data to ensure the consistency of the backed-up data.
Logical backups are primarily used for full backups of smaller data sets or fewer tables and do not guarantee data consistency during ongoing operations.
Physical Backups
Physical backups are divided into full backups and incremental backups. Full backups, also known as “snapshot backups,” ensure data consistency through snapshots. Incremental backups, referred to as “log backups” in the current TiDB version, back up the KV change logs over a recent period.
Snapshot Backups
Full Process of Snapshot Backup
- BR Receives Backup Command:
- BR receives the
br backup full
command and obtains the backup snapshot point and backup storage address.
- BR receives the
- BR Schedules Backup Data:
- Specific steps include:
- Pausing GC to prevent the backed-up data from being collected as garbage.
- Accessing PD to get information about the distribution of the Regions to be backed up and the TiKV node information.
- Creating a backup request and sending it to the TiKV nodes, including
backup ts
, the regions to be backed up, and the backup storage address.
- Specific steps include:
- TiKV Accepts Backup Request and Initializes Backup Worker:
- TiKV nodes receive the backup request and initialize a backup worker.
- TiKV Backs Up Data:
- Specific steps include:
- Reading data: The backup worker reads data corresponding to
backup ts
from the Region Leader. - Saving to SST files: The data is stored in memory as SST files.
- Uploading SST files to backup storage.
- Reading data: The backup worker reads data corresponding to
- Specific steps include:
- BR Retrieves Backup Results from Each TiKV:
- BR collects the backup results from each TiKV node. If there are changes in Regions, the process retries. If it is not possible to retry, the backup fails.
- BR Backs Up Metadata:
- BR backs up the table schema, calculates the table data checksum, generates backup metadata, and uploads it to the backup storage.
The recommended method is to use the br
command-line tool provided by TiDB to perform a snapshot backup. You can install it using tiup install br
. After installing br, you can use the related commands to perform a snapshot backup. Currently, snapshot backups support cluster-level, database-level, and table-level backups. Here is an example of using br for a cluster snapshot backup.
[tidb@tidb53 ~]$ tiup br backup full --pd "172.20.12.52:2679" --storage "local:///data1/backups" --ratelimit 128 --log-file backupfull.log
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br backup full --pd 172.20.12.52:2679 --storage local:///data1/backups --ratelimit 128 --log-file backupfull.log
Detail BR log in backupfull.log
[2024/03/05 10:19:27.437 +08:00] [WARN] [backup.go:311] ["setting `--ratelimit` and `--concurrency` at the same time, ignoring `--concurrency`: `--ratelimit` forces sequential (i.e. concurrency = 1) backup"] [ratelimit=134.2MB/s] [concurrency-specified=4]
Full Backup <----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
Checksum <-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2024/03/05 10:20:29.456 +08:00] [INFO] [collector.go:77] ["Full Backup success summary"] [total-ranges=207] [ranges-succeed=207] [ranges-failed=0] [backup-checksum=1.422780807s] [backup-fast-checksum=17.004817ms] [backup-total-ranges=161] [total-take=1m2.023929601s] [BackupTS=448162737288380420] [total-kv=25879266] [total-kv-size=3.587GB] [average-speed=57.82MB/s] [backup-data-size(after-compressed)=1.868GB] [Size=1867508767]
[tidb@tidb53 ~]$ ll /data1/backups/
总用量 468
drwxr-xr-x. 2 nfsnobody nfsnobody 20480 3月 5 10:20 1
drwxr-xr-x. 2 tidb tidb 12288 3月 5 10:20 4
drwxr-xr-x. 2 nfsnobody nfsnobody 12288 3月 5 10:20 5
-rw-r--r--. 1 nfsnobody nfsnobody 78 3月 5 10:19 backup.lock
-rw-r--r--. 1 nfsnobody nfsnobody 395 3月 5 10:20 backupmeta
-rw-r--r--. 1 nfsnobody nfsnobody 50848 3月 5 10:20 backupmeta.datafile.000000001
-rw-r--r--. 1 nfsnobody nfsnobody 365393 3月 5 10:20 backupmeta.schema.000000002
drwxrwxrwx. 3 nfsnobody nfsnobody 4096 3月 5 10:19 checkpoints
The --ratelimit
parameter indicates the maximum speed at which each TiKV can execute backup tasks, set to 128MB/s. The --log-file
parameter specifies the target file to write the backup logs. The --pd
parameter specifies the PD nodes. Additionally, the br
command supports the --backupts
parameter, which indicates the physical time point corresponding to the backup snapshot. If this parameter is not specified, the current time point is used as the snapshot time point.
If we want to determine when a completed snapshot backup was taken from a backup set, br
also provides the corresponding command br validate decode
. This command’s output is a TSO (Timestamp Oracle). We can use tidb_parse_tso
to parse it into the physical time, as shown below.
[tidb@tidb53 ~]$ tiup br validate decode --field="end-version" --storage "local:///data1/backups" | tail -n1
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br validate decode --field=end-version --storage local:///data1/backups
Detail BR log in /tmp/br.log.2024-03-05T10.24.25+0800
448162737288380420
mysql> select tidb_parse_tso(448162737288380420);
+------------------------------------+
| tidb_parse_tso(448162737288380420) |
+------------------------------------+
| 2024-03-05 10:19:28.489000 |
+------------------------------------+
1 row in set (0.01 sec)
Log Backup
Full Process of Log Backup
- BR Receives Backup Command:
- BR receives the
br log start
command. It parses and obtains the checkpoint timestamp and backup storage address for the log backup task and registers it in PD.
- BR receives the
- TiKV Monitors Log Backup Task Creation and Updates:
- Each TiKV node’s log backup observer listens for the creation and updates of log backup tasks in PD and backs up the data within the backup range on that node.
- TiKV Log Backup Observer Continuously Backs Up KV Change Logs:
- Specific steps include:
- Reading KV data changes and saving them to a custom format backup file.
- Periodically querying the global checkpoint timestamp from PD.
- Periodically generating local metadata.
- Periodically uploading log backup data and local metadata to the backup storage.
- Requesting PD to prevent unbacked data from being garbage collected.
- Specific steps include:
- TiDB Coordinator Monitors Log Backup Progress:
- It polls all TiKV nodes to get the backup progress for each region. Based on the region checkpoint timestamps, the overall progress of the log backup task is calculated and uploaded to PD.
- PD Persists Log Backup Task Status:
- The status of the log backup task can be queried using
br log status
.
- The status of the log backup task can be queried using
Log Backup Method:
Snapshot backup commands start with br backup ...
, while log backup commands start with br log ....
To start a log backup, use the command br log start
. After initiating a log backup task, use br log status
to check the status of the log backup task.
[tidb@tidb53 ~]$ tiup br validate decode --field="end-version" --storage "local:///data1/backups" | tail -n1
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br validate decode --field=end-version --storage local:///data1/backups
Detail BR log in /tmp/br.log.2024-03-05T10.24.25+0800
448162737288380420
mysql> select tidb_parse_tso(448162737288380420);
+------------------------------------+
| tidb_parse_tso(448162737288380420) |
+------------------------------------+
| 2024-03-05 10:19:28.489000 |
+------------------------------------+
1 row in set (0.01 sec)
In the above commands, the --task-name
parameter specifies the name of the log backup task, the --pd
parameter specifies the PD nodes, and the --storage
parameter specifies the log backup storage address. The br log
command also supports the --start-ts
parameter, specifying the log backup’s start time. If not specified, the current time is used as the start-ts
.
[tidb@tidb53 ~]$ tiup br log status --task-name=pitr --pd "172.20.12.52:2679"
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br log status --task-name=pitr --pd 172.20.12.52:2679
Detail BR log in /tmp/br.log.2024-03-05T10.56.28+0800
● Total 1 Tasks.
> #1 <
name: pitr
status: ● NORMAL
start: 2024-03-05 10:50:52.939 +0800
end: 2090-11-18 22:07:45.624 +0800
storage: local:///data1/backups/pitr
speed(est.): 0.00 ops/s
checkpoint[global]: 2024-03-05 10:55:42.69 +0800; gap=47s
[tidb@tidb53 ~]$ tiup br log status --task-name=pitr --pd "172.20.12.52:2679"
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br log status --task-name=pitr --pd 172.20.12.52:2679
Detail BR log in /tmp/br.log.2024-03-05T10.58.57+0800
● Total 1 Tasks.
> #1 <
name: pitr
status: ● NORMAL
start: 2024-03-05 10:50:52.939 +0800
end: 2090-11-18 22:07:45.624 +0800
storage: local:///data1/backups/pitr
speed(est.): 0.00 ops/s
checkpoint[global]: 2024-03-05 10:58:07.74 +0800; gap=51s
The above output shows that the log backup status is normal. Comparing the outputs at different times reveals that the log backup task is indeed being executed periodically in the background. The checkpoint[global] indicates that all data in the cluster has been saved in the backup storage earlier than this checkpoint time. It represents the most recent time from which the backup data can be restored.
Logical Backup
Logical backup can be used to extract the data from TiDB’s SQL statements or export tools. In addition to commonly used export statements, TiDB provides a tool called Dumpling, which can export data stored in TiDB or MySQL to SQL or CSV formats. For detailed documentation on Dumpling, please refer to Export Data Using Dumpling | PingCAP Documentation Center. A typical example of Dumpling is as follows, which exports all non-system table data from the target database in SQL file format, with 8 concurrent threads for exporting, an output directory of /tmp/test, intra-table concurrency to speed up the export starting at 200K records, and a maximum single file size of 256MB.
dumpling -u root -P 4000 -h 127.0.0.1 --filetype sql -t 8 -o /tmp/test -r 200000 -F256MiB
TiDB’s Recovery Capabilities
TiDB recovery can be divided into physical backup-based recovery and logical backup-based recovery. Physical backup-based recovery refers to using the br restore
command line to restore data, typically for large-scale complete data restoration. Logical backup-based recovery involves importing data, such as files exported by Dumpling, into the cluster, usually for small data sets or a few tables.
Physical Recovery
Physical recovery can be categorized into direct snapshot backup recovery and Point-in-Time Recovery (PITR). Snapshot backup recovery only requires specifying the backup storage path of the snapshot backup. PITR requires specifying the backup storage path (including snapshot and log backup data) and the time you want to restore.
Snapshot Backup Recovery The complete process of snapshot recovery is as follows (already included in the snapshot backup example above):
- BR Receives Restore Command:
- BR receives the
br restore
command, obtains the snapshot backup storage address and the objects to be restored, and checks whether the objects to be restored exist and meet the requirements.
- BR receives the
- BR Schedules Data Restoration:
- Specific steps include:
- Requesting PD to disable automatic Region scheduling.
- Reading and restoring the schema of the backup data.
- Requesting PD to allocate Regions based on the backup data information and distribute the Regions to TiKV.
- Sending restoration requests to TiKV based on the Regions allocated by PD.
- Specific steps include:
- TiKV Accepts Restore Request and Initializes Restore Worker:
- TiKV nodes receive the restore request and initialize a restore worker.
- TiKV Restores Data:
- Specific steps include:
- Downloading data from backup storage to the local machine.
- Restore workers rewrite the backup data kv (replacing table id and index id).
- Injecting the processed SST files into RocksDB.
- Returning the restoration result to BR.
- Specific steps include:
- BR Retrieves Restoration Results from Each TiKV:
Method of Snapshot Backup Recovery Snapshot backup recovery can be performed at the cluster, database, and table levels. It is recommended to restore to an empty cluster; if the objects to be restored already exist in the cluster, it will cause a restoration error (except for system tables). Below is an example of a cluster restoration:
[tidb@tidb53 ~]$ tiup br restore full --pd "172.20.12.52:2679" --storage "local:///data1/backups"
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br restore full --pd 172.20.12.52:2679 --storage local:///data1/backups
Detail BR log in /tmp/br.log.2024-03-05T13.08.08+0800
Full Restore <---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2024/03/05 13:08:27.918 +08:00] [INFO] [collector.go:77] ["Full Restore success summary"] [total-ranges=197] [ranges-succeed=197] [ranges-failed=0] [split-region=786.659µs] [restore-ranges=160] [total-take=19.347776543s] [RestoreTS=448165390238351361] [total-kv=25811349] [total-kv-size=3.561GB] [average-speed=184MB/s] [restore-data-size(after-compressed)=1.847GB] [Size=1846609490] [BackupTS=448162737288380420]
If you want to restore a single database, you just need to add the --db
parameter to the restore command. To restore a single table, you must add both the --db
and --table
parameters to the restore command.
PITR (Point-in-Time Recovery) The command for PITR is br restore point ...
. For initializing the restoration cluster, you must specify the snapshot backup using the --full-backup-storage
parameter to indicate the storage address of the snapshot backup. The --restored-ts
parameter specifies the point in time you want to restore. If this parameter is not specified, the restoration will be done at the latest recoverable time point. Additionally, if you only want to restore log backup data, you need to use the --start-ts
parameter to specify the starting time point for the log backup restoration.
Here is an example of a point-in-time recovery that includes snapshot recovery:
[tidb@tidb53 ~]$ tiup br restore point --pd "172.20.12.52:2679" --full-backup-storage "local:///data1/backups/fullbk" --storage "local:///data1/backups/pitr" --restored-ts "2024-03-05 13:38:28+0800"
tiup is checking updates for component br ...
Starting component `br`: /home/tidb/.tiup/components/br/v7.6.0/br restore point --pd 172.20.12.52:2679 --full-backup-storage local:///data1/backups/fullbk --storage local:///data1/backups/pitr --restored-ts 2024-03-05 13:38:28+0800
Detail BR log in /tmp/br.log.2024-03-05T13.45.02+0800
Full Restore <---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------> 100.00%
[2024/03/05 13:45:24.620 +08:00] [INFO] [collector.go:77] ["Full Restore success summary"] [total-ranges=111] [ranges-succeed=111] [ranges-failed=0] [split-region=644.837µs] [restore-ranges=75] [total-take=21.653726346s] [BackupTS=448165866711285765] [RestoreTS=448165971332694017] [total-kv=25811349] [total-kv-size=3.561GB] [average-speed=164.4MB/s] [restore-data-size(after-compressed)=1.846GB] [Size=1846489912]
Restore Meta Files <......................................................................................................................................................................................> 100%
Restore KV Files <........................................................................................................................................................................................> 100%
[2024/03/05 13:45:26.944 +08:00] [INFO] [collector.go:77] ["restore log success summary"] [total-take=2.323796546s] [restore-from=448165866711285765] [restore-to=448165867159552000] [restore-from="2024-03-05 13:38:26.29 +0800"] [restore-to="2024-03-05 13:38:28 +0800"] [total-kv-count=0] [skipped-kv-count-by-checkpoint=0] [total-size=0B] [skipped-size-by-checkpoint=0B] [average-speed=0B/s]
Logical Recovery
Logical recovery can also be understood as data import. Besides general SQL import, TiDB supports importing data using the Lightning tool. Lightning is a tool for importing data from static files into the TiDB cluster, commonly used for initial data import.
For more details, please refer to the official TiDB Lightning Overview | PingCAP Documentation Center.
Conclusion
This article provides a detailed summary of TiDB’s backup and restoration capabilities and basic usage methods. TiDB’s backup and restoration mechanisms ensure data security and consistency and meet data protection needs at various scales and scenarios through flexible strategies and tool support. For more in-depth technical details and operational guidance, please refer to the PingCAP official documentation center.
TiDB Cloud Dedicated
TiDB Cloudのエンタープライズ版。
専用VPC上に構築された専有DBaaSでAWSとGoogle Cloudで利用可能。
TiDB Cloud Serverless
TiDB Cloudのライト版。
TiDBの機能をフルマネージド環境で使用でき無料かつお客様の裁量で利用開始。