Go to file
2023-02-19 21:42:36 +01:00
inventory Add music user and enable samba 2023-02-19 21:33:36 +01:00
playbooks Rename backup batch config files 2023-02-19 21:42:36 +01:00
roles@5d8a2e3f43 Update roles submodule 2023-02-11 12:37:44 +01:00
scripts Replace testing shell scripts with python 2023-02-11 10:34:15 +01:00
.ansible-lint Introduce ansible-lint 2022-12-18 23:00:28 +01:00
.gitignore Add script to manage instance for backup testing 2023-01-02 23:39:04 +01:00
.gitmodules Move roles to shared repo 2022-12-20 19:56:45 +01:00
.yamllint Introduce yamllint 2022-12-18 23:43:40 +01:00
ansible.cfg Move roles to shared repo 2022-12-20 19:56:45 +01:00
main.yml Fix fact gathering when using tags 2022-12-19 14:45:10 +01:00
makefile Introduce yamllint 2022-12-18 23:43:40 +01:00
README.md Add notes on testing backups to README 2023-02-13 21:59:24 +01:00
requirements.txt Add music user and enable samba 2023-02-19 21:33:36 +01:00
vault-keyring-client.py Move to using virtualenv 2023-02-11 10:30:32 +01:00

The Ansible Edda

Ansible playbooks for provisioning The Nine Worlds.

Secrets vault

  • Encrypt with: ansible-vault encrypt vault.yml
  • Decrypt with: ansible-vault decrypt secrets.yml
  • Encrypt all vault.yml in a directory with: ansible-vault encrypt directory/**/vault.yml
  • Decrypt all vault.yml in a directory with: ansible-vault decrypt directory/**/vault.yml
  • Run a playbook with ansible-playbook --vault-id @prompt playbook.yml

The Nine Worlds

The main entrypoint for The Nine Worlds is main.yml.

Keyring integration

Keyring integration requires python3-keyring to be installed.

To set the keyring password run:

./vault-keyring-client.py --set [--vault-id <vault-id>]

If --vault-id is not specified, the password will be stored under ansible.

To use the password from the keyring invoke playbooks with:

ansible-playbook --vault-id @vault-keyring-client.py ...

Production and testing

The inventory files are split into production and testing.

To run the main.yml playbook on production hosts:

ansible-playbook main.yml -i production

To run the main.yml playbook on production hosts:

ansible-playbook main.yml -i testing

Testing virtual machines

The scripts for starting, stopping, and reverting the testing virtual machines is located in scripts/testing/vmgr.py.

Playbooks

The Ansible Edda playbook is composed of smaller playbooks. To run a single playbook, invoke the relevant playbook directly from the playbook directory. For example, to run the system playbook, run:

ansible-playbook playbooks/system.yml

Alternatively you can use its tag as well:

ansible-playbook main.yml --tags "system"

Roles

Playbooks are composed of roles defined in the roles directory, playbooks/roles.

To play only a specific role, e.g. system/base in the playbook system, run:

ansible-playbook playbooks/system.yml --tags "system:base"

Or from the main playbook:

ansible-playbook main.yml --tags "system:base"

Role sub-tasks

Some roles are split into smaller groups of tasks. This can be checked by looking at the tasks/main.yml file of a role, e.g. playbooks/roles/system/base/tasks/main.yml.

To play only a particular group within a role, e.g. sshd in base of system, run:

ansible-playbook playbooks/system.yml --tags "system:base:sshd"

Or from the main playbook:

ansible-playbook main.yml --tags "system:base:sshd"

Testing backups

Before testing the backups, you may want to shut yggdrasil down for extra confidence that it is not being accessed/modified during this process. It is easy to access yggdrasil by accident if /etc/hosts is not modified in the test VM, something that is easy to forget.

  1. Create baldur by running:
    python scripts/scaleway/baldur.py create --volume-size <size-in-GB>
    
    Pick a volume size that's larger than what yggdrasil estimates for rpool/var/lib/yggdrasil/data.
  2. Provision baldur by running
    ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
    
  3. Restore all the backups by ssh'ing into baldur and running (as root):
    /usr/local/sbin/restic-batch --config-dir /etc/restic-batch.d restore
    
  4. Start all the pod services with:
    ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
    
    Give them some time to download all the images and start.
  5. Once the CPU returns to idling check the state of all the pod services and their veth interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due to issues related to limited CPU and RAM).
  6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not have enough space. A VM is used to make sure that none of the services on the host workstation connect to baldur by accident.
  7. Modify /etc/hosts in the VM to point at baldur for all relevant domains.
  8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed connecting to baldur.
  9. Stop all the pod services with:
    ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
    
  10. Destroy baldur by running:
    python scripts/scaleway/baldur.py delete