diff --git a/README.md b/README.md index 10fd4f3..a14fc8e 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,10 @@ # The Ansible Edda -Ansible playbooks for provisioning The Nine Worlds. +Ansible playbooks for provisioning **The Nine Worlds**. -## Secrets vault +## Running the playbooks -- Encrypt with: ```ansible-vault encrypt vault.yml``` -- Decrypt with: ```ansible-vault decrypt secrets.yml``` -- Encrypt all `vault.yml` in a directory with: ```ansible-vault encrypt directory/**/vault.yml``` -- Decrypt all `vault.yml` in a directory with: ```ansible-vault decrypt directory/**/vault.yml``` -- Run a playbook with ```ansible-playbook --vault-id @prompt playbook.yml``` - -## The Nine Worlds - -The main entrypoint for The Nine Worlds is [`main.yml`](main.yml). +The main entrypoint for **The Nine Worlds** is [`main.yml`](main.yml). ### Keyring integration @@ -38,19 +30,14 @@ The inventory files are split into [`production`](production) and [`testing`](te To run the `main.yml` playbook on production hosts: ``` sh -ansible-playbook main.yml -i production +ansible-playbook main.yml -i inventory/production ``` -To run the `main.yml` playbook on production hosts: +To run the `main.yml` playbook on testing hosts: ``` sh -ansible-playbook main.yml -i testing +ansible-playbook main.yml -i inventory/testing ``` -### Testing virtual machines - -The scripts for starting, stopping, and reverting the testing virtual machines is located in -`scripts/testing/vmgr.py`. - ### Playbooks The Ansible Edda playbook is composed of smaller [`playbooks`](playbooks). To run a single playbook, @@ -69,156 +56,107 @@ ansible-playbook main.yml --tags "system" ### Roles -Playbooks are composed of roles defined in the `roles` directory, -[`playbooks/roles`](playbooks/roles). - -To play only a specific role, e.g. `system/base` in the playbook `system`, run: - -``` sh -ansible-playbook playbooks/system.yml --tags "system:base" -``` - -Or from the main playbook: - -``` sh -ansible-playbook main.yml --tags "system:base" -``` - -### Role sub-tasks - -Some roles are split into smaller groups of tasks. This can be checked by looking at the -`tasks/main.yml` file of a role, e.g. -[`playbooks/roles/system/base/tasks/main.yml`](playbooks/roles/system/base/tasks/main.yml). - -To play only a particular group within a role, e.g. `sshd` in `base` of `system`, run: +Playbooks are composed of roles defined in the `roles` submodule, [`roles`](roles), and the + `playbooks/roles` directory, [`playbooks/roles`](playbooks/roles). +To play a specific role, e.g., `system/base/sshd` in the playbook `system`, run: ``` sh ansible-playbook playbooks/system.yml --tags "system:base:sshd" ``` -Or from the main playbook: +To play all roles from a specific group, e.g., `system/base` in the playbook `system`, run: +``` sh +ansible-playbook playbooks/system.yml --tags "system:base" +``` +Some roles, e.g., `services/setup/user`, have sub-tasks which can also be invoked individually. To +find the relevant tag, see the role's `main.yml`. + +In all cases, the roles can be also invoked from the main playbook: ``` sh ansible-playbook main.yml --tags "system:base:sshd" +ansible-playbook main.yml --tags "system:base" ``` +## Testing virtual machines + +The scripts for starting, stopping, and reverting the testing virtual machines is located in +`scripts/testing/vmgr.py`. + +## Managing backup buckets + +The `scripts/restic/restic.py` script provides a wrapper around restic to manage the backup buckets. +The script collects the credentials from the OS keyring and constructs the restic command with the +correct endpoint. It allows the user to focus on the actual command to be executed rather than +authentication and bucket URLs. + +The `scripts/restic/restic.py` requires the following entries in the keyring: +- `scaleway`: `access_key` (Scaleway project ID), +- `scaleway`: `secret_key` (Scaleway secret key), +- `restic`: `password`. + +The easiest way to set these values is with Python's `keyring.set_password`. + ## Testing backups -Before testing the backups, you may want to shut `yggdrasil` down for extra confidence that it is -not being accessed/modified during this process. It is easy to access `yggdrasil` by accident if -`/etc/hosts` is not modified in the test VM, something that is easy to forget. +### Setting up baldur on yggdrasil -### Baldur on Scaleway - -1. Create `baldur` by running: - ```sh - python scripts/scaleway/baldur.py create --volume-size - ``` - Pick a volume size that's larger than what `yggdrasil` estimates for - `rpool/var/lib/yggdrasil/data`. -2. When done destroy `baldur` by running: - ```sh - python scripts/scaleway/baldur.py delete - ``` - -### Baldur on Yggdrasil - -1. Create a VM on `yggdrasil` and install the same OS that is running on `yggdrasil`. - - Install the OS on a zvol on `rpool`. - - If the same VM is to be used for testing, a GUI is helpful. - - Prepare a zvol on `hpool` of size that's larger than what `yggdrasil` estimates for - `rpool/var/lib/the-nine-worlds/data` and mount at `/var/lib/the-nine-worlds/data`. - - Create non-root user `wojtek` with `sudo` privileges. -2. Configure SSH to use `yggdrasil` as a jump server. +1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS. +2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's + larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding + datasets that are not backed up to the cloud). 3. Set `refreserv=0` on the zvols to make snapshots take less space. - - `zfs set refreserv=0 tank/home/ahrens` -4. Use ZFS for snapshots/roolback of the zvols. - - `zfs snapshot tank/home/ahrens@friday` - - `zfs rollback tank/home/ahrens@friday` -5. Service testing can then be done directly from the VM. To achieve that `/etc/hosts` needs to be - set to directly point at the right proxy server, e.g., `10.66.3.8`, not `localhost`. + - `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur` + - `zfs set refreserv=0 hpool/baldur` +4. Install the same OS that is running on `yggdrasil`, but with a DE, on + `rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at + `/var/lib/the-nine-worlds/data`. +5. Create non-root user `wojtek` with `sudo` privileges. +6. Configure SSH from the workstation to use `yggdrasil` as a jump server. +7. Use ZFS for snapshots/rollback of the zvols. + - `zfs snapshot rpool/var/lib/libvirt/images/baldur@start` + - `zfs snapshot hpool/baldur@start` -### Test +### Provision baldur 1. Provision `baldur` by running ```sh ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml ``` -2. Restore all the backups by ssh'ing into `baldur` and running (as root): +2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`. + Name resolution failures can cause containers to fail. +3. Restore all the backups by ssh'ing into `baldur` and running (as root): ```sh /usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore ``` -3. Once restore has completed, `chown -R :` all the restored directories in +4. Once restore has completed, `chown -R :` all the restored directories in `/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the backup was performed which may not match that of the new target machine. Note that permissions and ownership are restored as a second step once all the content is restored. Therefore, the files will list `root` as owner during the restoration. -4. Start all the pod services with: +5. Start all the pod services with: ```sh ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml ``` Give them some time to download all the images and start. -5. Once the CPU returns to idling check the state of all the pod services and their `veth` - interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due - to issues related to limited CPU and RAM). -6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not - have enough space. A VM is used to make sure that none of the services on the host workstation - connect to `baldur` by accident. -7. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains. -8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed +6. Once the CPU returns to idling check the state of all the pod services and their `veth` + interfaces. If necessary restart the affected pod, some containers fail to start up if the + database takes too long to come online. + +### Testing the backups + +1. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to + prevent live applications from accidentally connecting to `baldur`. +2. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains. +3. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed connecting to `baldur`. - - Some containers fail to start up if the database takes too long to come online. In that case - restart the container. - - Some containers fail to start up if they cannot make DNS queries. Note that `192.168.0.0/16` is - blocked by firewall rules. If `/etc/the-nine-worlds/resolv.conf` points at a DNS resolved at - such an address all DNS queries will fail. Simply update `resolv.conf` to e.g. `1.1.1.1`. -9. Stop all the pod services with: + +### Cleaning up + +1. Stop all the pod services with: ```sh ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml ``` - -## Music organisation - -The `playbooks/music.yml` playbook sets up tools and configuration for organising music. The process -is manual though. The steps for adding a new CD. - -All steps below are to be executed as the `music` user. - -### Note on tagging - -* For live albums add "YYYY-MM-DD at Venue, City, Country" in the "Subtitle" tag. -* For remasters use original release tags and add "YYYY Remaster" in the "Subtitle" tag. - -### Ripping a CD - -1. Use a CD ripper and rip the CD to `/var/lib/yggdrasil/home/music/rip` using flac encoding. -2. Samba has been set up to give Windows access to the above directory. Therefore, CD rippers - available only for Windows can also be used, e.g. dBpoweramp. - -### Import new music - -1. Run `beet import /var/lib/yggdrasil/home/music/rip`. This will move the music files to - `/var/lib/yggdrasil/data/music/collection`. -2. Run `beet convert -a `, where `` is used to narrow down to new music only. This - will convert the flac files into mp3 files for sharing via Nextcloud. -3. Run `nextcloud-upload /var/tmp/music/mp3/` for every artist to upload to Nextcloud. -4. Remove the `/var/tmp/music/mp3/` directory. - -#### Collections - -Every track has a `compilation` tag at track-level as well as at album-level (at least in Beets). To -label the album as a compilation for sorting purposes, run `beet modify -a comp=True`. - -### Archive music - -#### From rip - -1. Run `beet --config .config/beets/archive.yaml import --move /var/lib/yggdrasil/home/music/rip`. - This will move the music files to `/var/lib/yggdrasil/data/music/archive`. - -#### From collection - -1. Run `beet --config .config/beets/archive.yaml import - /var/lib/yggdrasil/data/music/collection//`. This will copy the music files to - `/var/lib/yggdrasil/data/music/archive`. -2. Run `beet remove -d -a "album:"`. This will remove the music files from the collection. +2. Delete the VM and the two zvols: + - `rpool/var/lib/libvirt/images/baldur`, + - `hpool/baldur`. diff --git a/scripts/scaleway/baldur.py b/scripts/scaleway/baldur.py deleted file mode 100644 index 77426ec..0000000 --- a/scripts/scaleway/baldur.py +++ /dev/null @@ -1,146 +0,0 @@ -import argparse -import keyring -import requests - - -class Scaleway: - API_ENDPOINT_BASE = "https://api.scaleway.com/instance/v1/zones" - ZONES = [ - "fr-par-1", "fr-par-2", "fr-par-3", - "nl-ams-1", "nl-ams-2", - "pl-waw-1", "pl-waw-2", - ] - - def __init__(self, project_id, secret_key): - self.__zone = None - self.__project_id = project_id - self.__headers = {"X-Auth-Token": secret_key} - - @property - def zone(self): - return self.__zone - - @zone.setter - def zone(self, zone): - if zone not in Scaleway.ZONES: - raise KeyError(f"{zone} is not a valid zone - must be one of {Scaleway.ZONES}") - self.__zone = zone - - @property - def project_id(self): - return self.__project_id - - def __url(self, item, id): - if self.__zone is None: - raise RuntimeError("zone must be set before making any API requests") - - url = f"{Scaleway.API_ENDPOINT_BASE}/{self.__zone}" - - if id == "products": - return f"{url}/products/{item}" - - url = f"{url}/{item}" - if id is not None: - url = f"{url}/{id}" - - return url - - @staticmethod - def __check_status(type, url, rsp): - if (rsp.status_code // 100) != 2: - raise RuntimeError( - f"{type} {url} returned with status code {rsp.status_code}: {rsp.json()}") - - def get(self, item, id=None): - url = self.__url(item, id) - r = requests.get(url, headers=self.__headers) - self.__check_status("GET", url, r) - return r.json()[item] - - def get_by_name(self, item, name): - items = self.get(item) - return next((it for it in items if it["name"] == name), None) - - def __post(self, url, data): - r = requests.post(url, headers=self.__headers, json=data) - self.__check_status("POST", url, r) - return r.json() - - def post(self, item, data): - return self.__post(self.__url(item, None), data) - - def post_action(self, item, id, action, data): - return self.__post(f"{self.__url(item, id)}/{action}", data) - - def delete(self, item, id): - url = self.__url(item, id) - r = requests.delete(url, headers=self.__headers) - self.__check_status("DELETE", url, r) - - -def create_baldur(scaleway, args): - volume_size = args.volume_size - - security_group = scaleway.get_by_name("security_groups", "baldur-security-group") - image = scaleway.get_by_name("images", "Debian Bullseye") - server_type = "PLAY2-PICO" - if server_type not in scaleway.get("servers", id="products"): - raise RuntimeError(f"{server_type} is not available in {scaleway.zone}") - - response = scaleway.post("ips", data={"project": scaleway.project_id}) - public_ip = response["ip"] - - baldur = { - "name": "baldur", - "dynamic_ip_required": False, - "commercial_type": server_type, - "image": image["id"], - "volumes": {"0": {"size": int(volume_size * 1_000_000_000)}}, - "enable_ipv6": False, - "public_ip": public_ip["id"], - "project": scaleway.project_id, - "security_group": security_group["id"], - } - - response = scaleway.post("servers", data=baldur) - server = response["server"] - - scaleway.post_action("servers", server["id"], "action", data={"action": "poweron"}) - - print("Baldur instance created:") - print(f" block volume size: {server['volumes']['0']['size']//1_000_000_000} GiB") - print(f" public ip address: {server['public_ip']['address']}") - - -def delete_baldur(scaleway, _): - server = scaleway.get_by_name("servers", "baldur") - if server is None: - raise RuntimeError(f"Baldur instance was not found in {scaleway.zone}") - ip = server["public_ip"] - - scaleway.post_action("servers", server["id"], "action", data={"action": "terminate"}) - scaleway.delete("ips", ip["id"]) - - -if __name__ == "__main__": - parser = argparse.ArgumentParser(description="Create or delete the Baldur instance") - - subparsers = parser.add_subparsers() - - create_parser = subparsers.add_parser("create") - create_parser.add_argument("--volume-size", type=int, required=True, - help="Block volume size (in GiB) to create") - create_parser.set_defaults(func=create_baldur) - - delete_parser = subparsers.add_parser("delete") - delete_parser.set_defaults(func=delete_baldur) - - args = parser.parse_args() - - scw_project_id = keyring.get_password("scaleway", "project_id") - scw_secret_key = keyring.get_password("scaleway", "secret_key") - - scaleway = Scaleway(scw_project_id, scw_secret_key) - scaleway.zone = "fr-par-2" - - args.func(scaleway, args)