Update README for creating Baldur on yggdrasil

This commit is contained in:
Wojciech Kozlowski 2023-07-15 22:49:03 +02:00
parent 7b84ee2d21
commit 7ce81fb818

View File

@ -108,42 +108,56 @@ Before testing the backups, you may want to shut `yggdrasil` down for extra conf
not being accessed/modified during this process. It is easy to access `yggdrasil` by accident if
`/etc/hosts` is not modified in the test VM, something that is easy to forget.
### Baldur on Scaleway
1. Create `baldur` by running:
```sh
python scripts/scaleway/baldur.py create --volume-size <size-in-GB>
```
Pick a volume size that's larger than what `yggdrasil` estimates for
`rpool/var/lib/yggdrasil/data`.
2. Provision `baldur` by running
2. When done destroy `baldur` by running:
```sh
python scripts/scaleway/baldur.py delete
```
### Baldur on Yggdrasil
1. Create a VM on `yggdrasil`.
- Install the OS on a zvol on `rpool`.
- Prepare a zvol on `hpool` of size that's larger than what `yggdrasil` estimates for
`rpool/var/lib/yggdrasil/data` and mount at `/var/lib/baldur/data`.
- Create non-root user `wojtek` with `sudo` privileges.
2. Configure SSH to use `yggdrasil` as a jump server.
### Test
1. Provision `baldur` by running
```sh
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
```
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
2. Restore all the backups by ssh'ing into `baldur` and running (as root):
```sh
/usr/local/sbin/restic-batch --config-dir /etc/restic-batch.d restore
```
4. Start all the pod services with:
3. Start all the pod services with:
```sh
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
```
Give them some time to download all the images and start.
5. Once the CPU returns to idling check the state of all the pod services and their `veth`
4. Once the CPU returns to idling check the state of all the pod services and their `veth`
interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due
to issues related to limited CPU and RAM).
6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not
5. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not
have enough space. A VM is used to make sure that none of the services on the host workstation
connect to `baldur` by accident.
7. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains.
8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
6. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains.
7. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
connecting to `baldur`.
9. Stop all the pod services with:
8. Stop all the pod services with:
```sh
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
```
10. Destroy `baldur` by running:
```sh
python scripts/scaleway/baldur.py delete
```
## Music organisation