Remove backup testing how-to (moved to notes)
This commit is contained in:
parent
c782f0f74e
commit
758128a436
66
README.md
66
README.md
@ -98,69 +98,3 @@ The `scripts/restic/restic.py` requires the following entries in the keyring:
|
||||
- `restic`: `password`.
|
||||
|
||||
The easiest way to set these values is with Python's `keyring.set_password`.
|
||||
|
||||
## Testing backups
|
||||
|
||||
### Setting up baldur on yggdrasil
|
||||
|
||||
1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS.
|
||||
2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's
|
||||
larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding
|
||||
datasets that are not backed up to the cloud).
|
||||
3. Set `refreserv=0` on the zvols to make snapshots take less space.
|
||||
- `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur`
|
||||
- `zfs set refreserv=0 hpool/baldur`
|
||||
4. Install the same OS that is running on `yggdrasil`, but with a DE, on
|
||||
`rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at
|
||||
`/var/lib/the-nine-worlds/data`.
|
||||
5. Create non-root user `wojtek` with `sudo` privileges.
|
||||
6. Configure SSH from the workstation to use `yggdrasil` as a jump server.
|
||||
7. Use ZFS for snapshots/rollback of the zvols.
|
||||
- `zfs snapshot rpool/var/lib/libvirt/images/baldur@start`
|
||||
- `zfs snapshot hpool/baldur@start`
|
||||
|
||||
### Provision baldur
|
||||
|
||||
1. Provision `baldur` by running
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
||||
```
|
||||
2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`.
|
||||
Name resolution failures can cause containers to fail.
|
||||
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
||||
```sh
|
||||
/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
|
||||
```
|
||||
4. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
||||
`/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the
|
||||
backup was performed which may not match that of the new target machine. Note that permissions
|
||||
and ownership are restored as a second step once all the content is restored. Therefore, the
|
||||
files will list `root` as owner during the restoration.
|
||||
5. Start all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
||||
```
|
||||
Give them some time to download all the images and start.
|
||||
6. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
||||
interfaces. If necessary restart the affected pod, some containers fail to start up if the
|
||||
database takes too long to come online.
|
||||
|
||||
### Testing the backups
|
||||
|
||||
1. Stop all services on `yggdrasil` to prevent accidental connections to the live services which
|
||||
defeats the point of testing backups.
|
||||
2. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to
|
||||
prevent live applications from accidentally connecting to `baldur`.
|
||||
3. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains.
|
||||
4. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
||||
connecting to `baldur`.
|
||||
|
||||
### Cleaning up
|
||||
|
||||
1. Stop all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
||||
```
|
||||
2. Delete the VM and the two zvols:
|
||||
- `rpool/var/lib/libvirt/images/baldur`,
|
||||
- `hpool/baldur`.
|
||||
|
Loading…
Reference in New Issue
Block a user