167 lines
6.2 KiB
Markdown
167 lines
6.2 KiB
Markdown
# The Ansible Edda
|
|
|
|
Ansible playbooks for provisioning **The Nine Worlds**.
|
|
|
|
## Running the playbooks
|
|
|
|
The main entrypoint for **The Nine Worlds** is [`main.yml`](main.yml).
|
|
|
|
### Keyring integration
|
|
|
|
Keyring integration requires `python3-keyring` to be installed.
|
|
|
|
To set the keyring password run:
|
|
|
|
``` sh
|
|
./vault-keyring-client.py --set [--vault-id <vault-id>]
|
|
```
|
|
|
|
If `--vault-id` is not specified, the password will be stored under `ansible`.
|
|
|
|
To use the password from the keyring invoke playbooks with:
|
|
|
|
``` sh
|
|
ansible-playbook --vault-id @vault-keyring-client.py ...
|
|
```
|
|
|
|
### Production and testing
|
|
|
|
The inventory files are split into [`inventory/production`](inventory/production) and
|
|
[`inventory/testing`](inventory/testing).
|
|
|
|
To run the `main.yml` playbook on production hosts:
|
|
``` sh
|
|
ansible-playbook -i inventory/production main.yml
|
|
```
|
|
|
|
To run the `main.yml` playbook on testing hosts:
|
|
``` sh
|
|
ansible-playbook -i inventory/testing main.yml
|
|
```
|
|
|
|
### Playbooks
|
|
|
|
The Ansible Edda playbook is composed of smaller [`playbooks`](playbooks). To run a single playbook,
|
|
invoke the relevant playbook directly from the playbook directory. For example, to run the
|
|
[`playbooks/system.yml`](playbooks/system.yml) playbook, run:
|
|
|
|
``` sh
|
|
ansible-playbook playbooks/system.yml
|
|
```
|
|
|
|
Alternatively you can use its tag as well:
|
|
|
|
``` sh
|
|
ansible-playbook main.yml --tags "system"
|
|
```
|
|
|
|
### Roles
|
|
|
|
Playbooks are composed of roles defined in the
|
|
[`roles`](http://git.thenineworlds.net/the-nine-worlds/ansible-roles) submodule and
|
|
[`playbooks/roles`](playbooks/roles).
|
|
|
|
To play a specific role, e.g., `system/base/sshd` in the playbook `system`, run:
|
|
``` sh
|
|
ansible-playbook playbooks/system.yml --tags "system:base:sshd"
|
|
```
|
|
|
|
To play all roles from a specific group, e.g., `system/base` in the playbook `system`, run:
|
|
``` sh
|
|
ansible-playbook playbooks/system.yml --tags "system:base"
|
|
```
|
|
|
|
Some roles, e.g., `services/setup/user`, have sub-tasks which can also be invoked individually. To
|
|
find the relevant tag, see the role's `tasks/main.yml`.
|
|
|
|
In all cases, the roles can be also invoked from the main playbook:
|
|
``` sh
|
|
ansible-playbook main.yml --tags "system:base:sshd"
|
|
ansible-playbook main.yml --tags "system:base"
|
|
```
|
|
|
|
## Testing virtual machines
|
|
|
|
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
|
[`scripts/testing/vmgr.py`](scripts/testing/vmgr.py).
|
|
|
|
## Managing backup buckets
|
|
|
|
The [`scripts/restic/restic.py`](scripts/restic/restic.py) script provides a wrapper around restic
|
|
to manage the backup buckets. The script collects the credentials from the OS keyring and constructs
|
|
the restic command with the correct endpoint. It allows the user to focus on the actual command to
|
|
be executed rather than authentication and bucket URLs.
|
|
|
|
The `scripts/restic/restic.py` requires the following entries in the keyring:
|
|
- `scaleway`: `access_key` (Scaleway project ID),
|
|
- `scaleway`: `secret_key` (Scaleway secret key),
|
|
- `restic`: `password`.
|
|
|
|
The easiest way to set these values is with Python's `keyring.set_password`.
|
|
|
|
## Testing backups
|
|
|
|
### Setting up baldur on yggdrasil
|
|
|
|
1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS.
|
|
2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's
|
|
larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding
|
|
datasets that are not backed up to the cloud).
|
|
3. Set `refreserv=0` on the zvols to make snapshots take less space.
|
|
- `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur`
|
|
- `zfs set refreserv=0 hpool/baldur`
|
|
4. Install the same OS that is running on `yggdrasil`, but with a DE, on
|
|
`rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at
|
|
`/var/lib/the-nine-worlds/data`.
|
|
5. Create non-root user `wojtek` with `sudo` privileges.
|
|
6. Configure SSH from the workstation to use `yggdrasil` as a jump server.
|
|
7. Use ZFS for snapshots/rollback of the zvols.
|
|
- `zfs snapshot rpool/var/lib/libvirt/images/baldur@start`
|
|
- `zfs snapshot hpool/baldur@start`
|
|
|
|
### Provision baldur
|
|
|
|
1. Provision `baldur` by running
|
|
```sh
|
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
|
```
|
|
2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`.
|
|
Name resolution failures can cause containers to fail.
|
|
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
|
```sh
|
|
/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
|
|
```
|
|
4. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
|
`/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the
|
|
backup was performed which may not match that of the new target machine. Note that permissions
|
|
and ownership are restored as a second step once all the content is restored. Therefore, the
|
|
files will list `root` as owner during the restoration.
|
|
5. Start all the pod services with:
|
|
```sh
|
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
|
```
|
|
Give them some time to download all the images and start.
|
|
6. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
|
interfaces. If necessary restart the affected pod, some containers fail to start up if the
|
|
database takes too long to come online.
|
|
|
|
### Testing the backups
|
|
|
|
1. Stop all services on `yggdrasil` to prevent accidental connections to the live services which
|
|
defeats the point of testing backups.
|
|
2. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to
|
|
prevent live applications from accidentally connecting to `baldur`.
|
|
3. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains.
|
|
4. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
|
connecting to `baldur`.
|
|
|
|
### Cleaning up
|
|
|
|
1. Stop all the pod services with:
|
|
```sh
|
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
|
```
|
|
2. Delete the VM and the two zvols:
|
|
- `rpool/var/lib/libvirt/images/baldur`,
|
|
- `hpool/baldur`.
|