Add notes on testing backups to README
This commit is contained in:
parent
d87423c244
commit
faa68b0585
47
README.md
47
README.md
@ -48,8 +48,8 @@ ansible-playbook main.yml -i testing
|
||||
|
||||
### Testing virtual machines
|
||||
|
||||
Scripts for starting, stopping, and reverting the testing virtual machines are located in
|
||||
`scripts/testing`.
|
||||
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
||||
`scripts/testing/vmgr.py`.
|
||||
|
||||
### Playbooks
|
||||
|
||||
@ -101,3 +101,46 @@ Or from the main playbook:
|
||||
``` sh
|
||||
ansible-playbook main.yml --tags "system:base:sshd"
|
||||
```
|
||||
|
||||
## Testing backups
|
||||
|
||||
Before testing the backups, you may want to shut `yggdrasil` down for extra confidence that it is
|
||||
not being accessed/modified during this process. It is easy to access `yggdrasil` by accident if
|
||||
`/etc/hosts` is not modified in the test VM, something that is easy to forget.
|
||||
|
||||
1. Create `baldur` by running:
|
||||
```sh
|
||||
python scripts/scaleway/baldur.py create --volume-size <size-in-GB>
|
||||
```
|
||||
Pick a volume size that's larger than what `yggdrasil` estimates for
|
||||
`rpool/var/lib/yggdrasil/data`.
|
||||
2. Provision `baldur` by running
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
||||
```
|
||||
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
||||
```sh
|
||||
/usr/local/sbin/restic-batch --config-dir /etc/restic-batch.d restore
|
||||
```
|
||||
4. Start all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
||||
```
|
||||
Give them some time to download all the images and start.
|
||||
5. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
||||
interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due
|
||||
to issues related to limited CPU and RAM).
|
||||
6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not
|
||||
have enough space. A VM is used to make sure that none of the services on the host workstation
|
||||
connect to `baldur` by accident.
|
||||
7. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains.
|
||||
8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
||||
connecting to `baldur`.
|
||||
9. Stop all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
||||
```
|
||||
10. Destroy `baldur` by running:
|
||||
```sh
|
||||
python scripts/scaleway/baldur.py delete
|
||||
```
|
||||
|
Loading…
Reference in New Issue
Block a user