inventory | ||
playbooks | ||
roles@7d1b975af6 | ||
scripts | ||
.ansible-lint | ||
.gitignore | ||
.gitmodules | ||
.yamllint | ||
ansible.cfg | ||
main.yml | ||
makefile | ||
README.md | ||
requirements.txt | ||
vault-keyring-client.py |
The Ansible Edda
Ansible playbooks for provisioning The Nine Worlds.
Running the playbooks
The main entrypoint for The Nine Worlds is main.yml
.
Keyring integration
Keyring integration requires python3-keyring
to be installed.
To set the keyring password run:
./vault-keyring-client.py --set [--vault-id <vault-id>]
If --vault-id
is not specified, the password will be stored under ansible
.
To use the password from the keyring invoke playbooks with:
ansible-playbook --vault-id @vault-keyring-client.py ...
Production and testing
The inventory files are split into inventory/production
and
inventory/testing
.
To run the main.yml
playbook on production hosts:
ansible-playbook -i inventory/production main.yml
To run the main.yml
playbook on testing hosts:
ansible-playbook -i inventory/testing main.yml
Playbooks
The Ansible Edda playbook is composed of smaller playbooks
. To run a single playbook,
invoke the relevant playbook directly from the playbook directory. For example, to run the
playbooks/system.yml
playbook, run:
ansible-playbook playbooks/system.yml
Alternatively you can use its tag as well:
ansible-playbook main.yml --tags "system"
Roles
Playbooks are composed of roles defined in the
roles
submodule and
playbooks/roles
.
To play a specific role, e.g., system/base/sshd
in the playbook system
, run:
ansible-playbook playbooks/system.yml --tags "system:base:sshd"
To play all roles from a specific group, e.g., system/base
in the playbook system
, run:
ansible-playbook playbooks/system.yml --tags "system:base"
Some roles, e.g., services/setup/user
, have sub-tasks which can also be invoked individually. To
find the relevant tag, see the role's tasks/main.yml
.
In all cases, the roles can be also invoked from the main playbook:
ansible-playbook main.yml --tags "system:base:sshd"
ansible-playbook main.yml --tags "system:base"
Testing virtual machines
The scripts for starting, stopping, and reverting the testing virtual machines is located in
scripts/testing/vmgr.py
.
Managing backup buckets
The scripts/restic/restic.py
script provides a wrapper around restic
to manage the backup buckets. The script collects the credentials from the OS keyring and constructs
the restic command with the correct endpoint. It allows the user to focus on the actual command to
be executed rather than authentication and bucket URLs.
The scripts/restic/restic.py
requires the following entries in the keyring:
scaleway
:access_key
(Scaleway project ID),scaleway
:secret_key
(Scaleway secret key),restic
:password
.
The easiest way to set these values is with Python's keyring.set_password
.
Testing backups
Setting up baldur on yggdrasil
- Create the zvol
rpool/var/lib/libvirt/images/baldur
for the testing OS. - Create the zvol
hpool/baldur
for the backup data under test. It should have a capacity that's larger than whatyggdrasil
estimates forrpool/var/lib/the-nine-worlds/data
(excluding datasets that are not backed up to the cloud). - Set
refreserv=0
on the zvols to make snapshots take less space.zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur
zfs set refreserv=0 hpool/baldur
- Install the same OS that is running on
yggdrasil
, but with a DE, onrpool/var/lib/libvirt/images/baldur
withhpool/baldur
mounted within at/var/lib/the-nine-worlds/data
. - Create non-root user
wojtek
withsudo
privileges. - Configure SSH from the workstation to use
yggdrasil
as a jump server. - Use ZFS for snapshots/rollback of the zvols.
zfs snapshot rpool/var/lib/libvirt/images/baldur@start
zfs snapshot hpool/baldur@start
Provision baldur
- Provision
baldur
by runningansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
- Update
/etc/the-nine-worlds/resolv.conf
to point at a public DNS resolver, e.g.,1.1.1.1
. Name resolution failures can cause containers to fail. - Restore all the backups by ssh'ing into
baldur
and running (as root):/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
- Once restore has completed,
chown -R <user>:<user>
all the restored directories in/var/lib/the-nine-worlds/data
. Restic restores the UID information of the host from which the backup was performed which may not match that of the new target machine. Note that permissions and ownership are restored as a second step once all the content is restored. Therefore, the files will listroot
as owner during the restoration. - Start all the pod services with:
Give them some time to download all the images and start.ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
- Once the CPU returns to idling check the state of all the pod services and their
veth
interfaces. If necessary restart the affected pod, some containers fail to start up if the database takes too long to come online.
Testing the backups
- Stop all services on
yggdrasil
to prevent accidental connections to the live services which defeats the point of testing backups. - Log into the
baldur
. Testing from a VM (as opposed to a regular workstation) is important to prevent live applications from accidentally connecting tobaldur
. - Modify
/etc/hosts
in the VM to point atrproxy
(e.g.,10.66.3.8
) for all relevant domains. - Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
connecting to
baldur
.
Cleaning up
- Stop all the pod services with:
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
- Delete the VM and the two zvols:
rpool/var/lib/libvirt/images/baldur
,hpool/baldur
.