diff --git a/README.md b/README.md index c688474..623aeea 100644 --- a/README.md +++ b/README.md @@ -98,69 +98,3 @@ The `scripts/restic/restic.py` requires the following entries in the keyring: - `restic`: `password`. The easiest way to set these values is with Python's `keyring.set_password`. - -## Testing backups - -### Setting up baldur on yggdrasil - -1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS. -2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's - larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding - datasets that are not backed up to the cloud). -3. Set `refreserv=0` on the zvols to make snapshots take less space. - - `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur` - - `zfs set refreserv=0 hpool/baldur` -4. Install the same OS that is running on `yggdrasil`, but with a DE, on - `rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at - `/var/lib/the-nine-worlds/data`. -5. Create non-root user `wojtek` with `sudo` privileges. -6. Configure SSH from the workstation to use `yggdrasil` as a jump server. -7. Use ZFS for snapshots/rollback of the zvols. - - `zfs snapshot rpool/var/lib/libvirt/images/baldur@start` - - `zfs snapshot hpool/baldur@start` - -### Provision baldur - -1. Provision `baldur` by running - ```sh - ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml - ``` -2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`. - Name resolution failures can cause containers to fail. -3. Restore all the backups by ssh'ing into `baldur` and running (as root): - ```sh - /usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore - ``` -4. Once restore has completed, `chown -R :` all the restored directories in - `/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the - backup was performed which may not match that of the new target machine. Note that permissions - and ownership are restored as a second step once all the content is restored. Therefore, the - files will list `root` as owner during the restoration. -5. Start all the pod services with: - ```sh - ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml - ``` - Give them some time to download all the images and start. -6. Once the CPU returns to idling check the state of all the pod services and their `veth` - interfaces. If necessary restart the affected pod, some containers fail to start up if the - database takes too long to come online. - -### Testing the backups - -1. Stop all services on `yggdrasil` to prevent accidental connections to the live services which - defeats the point of testing backups. -2. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to - prevent live applications from accidentally connecting to `baldur`. -3. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains. -4. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed - connecting to `baldur`. - -### Cleaning up - -1. Stop all the pod services with: - ```sh - ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml - ``` -2. Delete the VM and the two zvols: - - `rpool/var/lib/libvirt/images/baldur`, - - `hpool/baldur`.