Update README
This commit is contained in:
parent
6954490bf4
commit
97ea02c904
212
README.md
212
README.md
@ -1,18 +1,10 @@
|
|||||||
# The Ansible Edda
|
# The Ansible Edda
|
||||||
|
|
||||||
Ansible playbooks for provisioning The Nine Worlds.
|
Ansible playbooks for provisioning **The Nine Worlds**.
|
||||||
|
|
||||||
## Secrets vault
|
## Running the playbooks
|
||||||
|
|
||||||
- Encrypt with: ```ansible-vault encrypt vault.yml```
|
The main entrypoint for **The Nine Worlds** is [`main.yml`](main.yml).
|
||||||
- Decrypt with: ```ansible-vault decrypt secrets.yml```
|
|
||||||
- Encrypt all `vault.yml` in a directory with: ```ansible-vault encrypt directory/**/vault.yml```
|
|
||||||
- Decrypt all `vault.yml` in a directory with: ```ansible-vault decrypt directory/**/vault.yml```
|
|
||||||
- Run a playbook with ```ansible-playbook --vault-id @prompt playbook.yml```
|
|
||||||
|
|
||||||
## The Nine Worlds
|
|
||||||
|
|
||||||
The main entrypoint for The Nine Worlds is [`main.yml`](main.yml).
|
|
||||||
|
|
||||||
### Keyring integration
|
### Keyring integration
|
||||||
|
|
||||||
@ -38,19 +30,14 @@ The inventory files are split into [`production`](production) and [`testing`](te
|
|||||||
|
|
||||||
To run the `main.yml` playbook on production hosts:
|
To run the `main.yml` playbook on production hosts:
|
||||||
``` sh
|
``` sh
|
||||||
ansible-playbook main.yml -i production
|
ansible-playbook main.yml -i inventory/production
|
||||||
```
|
```
|
||||||
|
|
||||||
To run the `main.yml` playbook on production hosts:
|
To run the `main.yml` playbook on testing hosts:
|
||||||
``` sh
|
``` sh
|
||||||
ansible-playbook main.yml -i testing
|
ansible-playbook main.yml -i inventory/testing
|
||||||
```
|
```
|
||||||
|
|
||||||
### Testing virtual machines
|
|
||||||
|
|
||||||
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
|
||||||
`scripts/testing/vmgr.py`.
|
|
||||||
|
|
||||||
### Playbooks
|
### Playbooks
|
||||||
|
|
||||||
The Ansible Edda playbook is composed of smaller [`playbooks`](playbooks). To run a single playbook,
|
The Ansible Edda playbook is composed of smaller [`playbooks`](playbooks). To run a single playbook,
|
||||||
@ -69,156 +56,107 @@ ansible-playbook main.yml --tags "system"
|
|||||||
|
|
||||||
### Roles
|
### Roles
|
||||||
|
|
||||||
Playbooks are composed of roles defined in the `roles` directory,
|
Playbooks are composed of roles defined in the `roles` submodule, [`roles`](roles), and the
|
||||||
[`playbooks/roles`](playbooks/roles).
|
`playbooks/roles` directory, [`playbooks/roles`](playbooks/roles).
|
||||||
|
|
||||||
To play only a specific role, e.g. `system/base` in the playbook `system`, run:
|
|
||||||
|
|
||||||
``` sh
|
|
||||||
ansible-playbook playbooks/system.yml --tags "system:base"
|
|
||||||
```
|
|
||||||
|
|
||||||
Or from the main playbook:
|
|
||||||
|
|
||||||
``` sh
|
|
||||||
ansible-playbook main.yml --tags "system:base"
|
|
||||||
```
|
|
||||||
|
|
||||||
### Role sub-tasks
|
|
||||||
|
|
||||||
Some roles are split into smaller groups of tasks. This can be checked by looking at the
|
|
||||||
`tasks/main.yml` file of a role, e.g.
|
|
||||||
[`playbooks/roles/system/base/tasks/main.yml`](playbooks/roles/system/base/tasks/main.yml).
|
|
||||||
|
|
||||||
To play only a particular group within a role, e.g. `sshd` in `base` of `system`, run:
|
|
||||||
|
|
||||||
|
To play a specific role, e.g., `system/base/sshd` in the playbook `system`, run:
|
||||||
``` sh
|
``` sh
|
||||||
ansible-playbook playbooks/system.yml --tags "system:base:sshd"
|
ansible-playbook playbooks/system.yml --tags "system:base:sshd"
|
||||||
```
|
```
|
||||||
|
|
||||||
Or from the main playbook:
|
To play all roles from a specific group, e.g., `system/base` in the playbook `system`, run:
|
||||||
|
``` sh
|
||||||
|
ansible-playbook playbooks/system.yml --tags "system:base"
|
||||||
|
```
|
||||||
|
|
||||||
|
Some roles, e.g., `services/setup/user`, have sub-tasks which can also be invoked individually. To
|
||||||
|
find the relevant tag, see the role's `main.yml`.
|
||||||
|
|
||||||
|
In all cases, the roles can be also invoked from the main playbook:
|
||||||
``` sh
|
``` sh
|
||||||
ansible-playbook main.yml --tags "system:base:sshd"
|
ansible-playbook main.yml --tags "system:base:sshd"
|
||||||
|
ansible-playbook main.yml --tags "system:base"
|
||||||
```
|
```
|
||||||
|
|
||||||
|
## Testing virtual machines
|
||||||
|
|
||||||
|
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
||||||
|
`scripts/testing/vmgr.py`.
|
||||||
|
|
||||||
|
## Managing backup buckets
|
||||||
|
|
||||||
|
The `scripts/restic/restic.py` script provides a wrapper around restic to manage the backup buckets.
|
||||||
|
The script collects the credentials from the OS keyring and constructs the restic command with the
|
||||||
|
correct endpoint. It allows the user to focus on the actual command to be executed rather than
|
||||||
|
authentication and bucket URLs.
|
||||||
|
|
||||||
|
The `scripts/restic/restic.py` requires the following entries in the keyring:
|
||||||
|
- `scaleway`: `access_key` (Scaleway project ID),
|
||||||
|
- `scaleway`: `secret_key` (Scaleway secret key),
|
||||||
|
- `restic`: `password`.
|
||||||
|
|
||||||
|
The easiest way to set these values is with Python's `keyring.set_password`.
|
||||||
|
|
||||||
## Testing backups
|
## Testing backups
|
||||||
|
|
||||||
Before testing the backups, you may want to shut `yggdrasil` down for extra confidence that it is
|
### Setting up baldur on yggdrasil
|
||||||
not being accessed/modified during this process. It is easy to access `yggdrasil` by accident if
|
|
||||||
`/etc/hosts` is not modified in the test VM, something that is easy to forget.
|
|
||||||
|
|
||||||
### Baldur on Scaleway
|
1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS.
|
||||||
|
2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's
|
||||||
1. Create `baldur` by running:
|
larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding
|
||||||
```sh
|
datasets that are not backed up to the cloud).
|
||||||
python scripts/scaleway/baldur.py create --volume-size <size-in-GB>
|
|
||||||
```
|
|
||||||
Pick a volume size that's larger than what `yggdrasil` estimates for
|
|
||||||
`rpool/var/lib/yggdrasil/data`.
|
|
||||||
2. When done destroy `baldur` by running:
|
|
||||||
```sh
|
|
||||||
python scripts/scaleway/baldur.py delete
|
|
||||||
```
|
|
||||||
|
|
||||||
### Baldur on Yggdrasil
|
|
||||||
|
|
||||||
1. Create a VM on `yggdrasil` and install the same OS that is running on `yggdrasil`.
|
|
||||||
- Install the OS on a zvol on `rpool`.
|
|
||||||
- If the same VM is to be used for testing, a GUI is helpful.
|
|
||||||
- Prepare a zvol on `hpool` of size that's larger than what `yggdrasil` estimates for
|
|
||||||
`rpool/var/lib/the-nine-worlds/data` and mount at `/var/lib/the-nine-worlds/data`.
|
|
||||||
- Create non-root user `wojtek` with `sudo` privileges.
|
|
||||||
2. Configure SSH to use `yggdrasil` as a jump server.
|
|
||||||
3. Set `refreserv=0` on the zvols to make snapshots take less space.
|
3. Set `refreserv=0` on the zvols to make snapshots take less space.
|
||||||
- `zfs set refreserv=0 tank/home/ahrens`
|
- `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur`
|
||||||
4. Use ZFS for snapshots/roolback of the zvols.
|
- `zfs set refreserv=0 hpool/baldur`
|
||||||
- `zfs snapshot tank/home/ahrens@friday`
|
4. Install the same OS that is running on `yggdrasil`, but with a DE, on
|
||||||
- `zfs rollback tank/home/ahrens@friday`
|
`rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at
|
||||||
5. Service testing can then be done directly from the VM. To achieve that `/etc/hosts` needs to be
|
`/var/lib/the-nine-worlds/data`.
|
||||||
set to directly point at the right proxy server, e.g., `10.66.3.8`, not `localhost`.
|
5. Create non-root user `wojtek` with `sudo` privileges.
|
||||||
|
6. Configure SSH from the workstation to use `yggdrasil` as a jump server.
|
||||||
|
7. Use ZFS for snapshots/rollback of the zvols.
|
||||||
|
- `zfs snapshot rpool/var/lib/libvirt/images/baldur@start`
|
||||||
|
- `zfs snapshot hpool/baldur@start`
|
||||||
|
|
||||||
### Test
|
### Provision baldur
|
||||||
|
|
||||||
1. Provision `baldur` by running
|
1. Provision `baldur` by running
|
||||||
```sh
|
```sh
|
||||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
||||||
```
|
```
|
||||||
2. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`.
|
||||||
|
Name resolution failures can cause containers to fail.
|
||||||
|
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
||||||
```sh
|
```sh
|
||||||
/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
|
/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
|
||||||
```
|
```
|
||||||
3. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
4. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
||||||
`/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the
|
`/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the
|
||||||
backup was performed which may not match that of the new target machine. Note that permissions
|
backup was performed which may not match that of the new target machine. Note that permissions
|
||||||
and ownership are restored as a second step once all the content is restored. Therefore, the
|
and ownership are restored as a second step once all the content is restored. Therefore, the
|
||||||
files will list `root` as owner during the restoration.
|
files will list `root` as owner during the restoration.
|
||||||
4. Start all the pod services with:
|
5. Start all the pod services with:
|
||||||
```sh
|
```sh
|
||||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
||||||
```
|
```
|
||||||
Give them some time to download all the images and start.
|
Give them some time to download all the images and start.
|
||||||
5. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
6. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
||||||
interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due
|
interfaces. If necessary restart the affected pod, some containers fail to start up if the
|
||||||
to issues related to limited CPU and RAM).
|
database takes too long to come online.
|
||||||
6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not
|
|
||||||
have enough space. A VM is used to make sure that none of the services on the host workstation
|
### Testing the backups
|
||||||
connect to `baldur` by accident.
|
|
||||||
7. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains.
|
1. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to
|
||||||
8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
prevent live applications from accidentally connecting to `baldur`.
|
||||||
|
2. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains.
|
||||||
|
3. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
||||||
connecting to `baldur`.
|
connecting to `baldur`.
|
||||||
- Some containers fail to start up if the database takes too long to come online. In that case
|
|
||||||
restart the container.
|
### Cleaning up
|
||||||
- Some containers fail to start up if they cannot make DNS queries. Note that `192.168.0.0/16` is
|
|
||||||
blocked by firewall rules. If `/etc/the-nine-worlds/resolv.conf` points at a DNS resolved at
|
1. Stop all the pod services with:
|
||||||
such an address all DNS queries will fail. Simply update `resolv.conf` to e.g. `1.1.1.1`.
|
|
||||||
9. Stop all the pod services with:
|
|
||||||
```sh
|
```sh
|
||||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
||||||
```
|
```
|
||||||
|
2. Delete the VM and the two zvols:
|
||||||
## Music organisation
|
- `rpool/var/lib/libvirt/images/baldur`,
|
||||||
|
- `hpool/baldur`.
|
||||||
The `playbooks/music.yml` playbook sets up tools and configuration for organising music. The process
|
|
||||||
is manual though. The steps for adding a new CD.
|
|
||||||
|
|
||||||
All steps below are to be executed as the `music` user.
|
|
||||||
|
|
||||||
### Note on tagging
|
|
||||||
|
|
||||||
* For live albums add "YYYY-MM-DD at Venue, City, Country" in the "Subtitle" tag.
|
|
||||||
* For remasters use original release tags and add "YYYY Remaster" in the "Subtitle" tag.
|
|
||||||
|
|
||||||
### Ripping a CD
|
|
||||||
|
|
||||||
1. Use a CD ripper and rip the CD to `/var/lib/yggdrasil/home/music/rip` using flac encoding.
|
|
||||||
2. Samba has been set up to give Windows access to the above directory. Therefore, CD rippers
|
|
||||||
available only for Windows can also be used, e.g. dBpoweramp.
|
|
||||||
|
|
||||||
### Import new music
|
|
||||||
|
|
||||||
1. Run `beet import /var/lib/yggdrasil/home/music/rip`. This will move the music files to
|
|
||||||
`/var/lib/yggdrasil/data/music/collection`.
|
|
||||||
2. Run `beet convert -a <match>`, where `<match>` is used to narrow down to new music only. This
|
|
||||||
will convert the flac files into mp3 files for sharing via Nextcloud.
|
|
||||||
3. Run `nextcloud-upload /var/tmp/music/mp3/<artist>` for every artist to upload to Nextcloud.
|
|
||||||
4. Remove the `/var/tmp/music/mp3/<artist>` directory.
|
|
||||||
|
|
||||||
#### Collections
|
|
||||||
|
|
||||||
Every track has a `compilation` tag at track-level as well as at album-level (at least in Beets). To
|
|
||||||
label the album as a compilation for sorting purposes, run `beet modify -a <album> comp=True`.
|
|
||||||
|
|
||||||
### Archive music
|
|
||||||
|
|
||||||
#### From rip
|
|
||||||
|
|
||||||
1. Run `beet --config .config/beets/archive.yaml import --move /var/lib/yggdrasil/home/music/rip`.
|
|
||||||
This will move the music files to `/var/lib/yggdrasil/data/music/archive`.
|
|
||||||
|
|
||||||
#### From collection
|
|
||||||
|
|
||||||
1. Run `beet --config .config/beets/archive.yaml import
|
|
||||||
/var/lib/yggdrasil/data/music/collection/<artist>/<album>`. This will copy the music files to
|
|
||||||
`/var/lib/yggdrasil/data/music/archive`.
|
|
||||||
2. Run `beet remove -d -a "album:<album>"`. This will remove the music files from the collection.
|
|
||||||
|
@ -1,146 +0,0 @@
|
|||||||
import argparse
|
|
||||||
import keyring
|
|
||||||
import requests
|
|
||||||
|
|
||||||
|
|
||||||
class Scaleway:
|
|
||||||
API_ENDPOINT_BASE = "https://api.scaleway.com/instance/v1/zones"
|
|
||||||
ZONES = [
|
|
||||||
"fr-par-1", "fr-par-2", "fr-par-3",
|
|
||||||
"nl-ams-1", "nl-ams-2",
|
|
||||||
"pl-waw-1", "pl-waw-2",
|
|
||||||
]
|
|
||||||
|
|
||||||
def __init__(self, project_id, secret_key):
|
|
||||||
self.__zone = None
|
|
||||||
self.__project_id = project_id
|
|
||||||
self.__headers = {"X-Auth-Token": secret_key}
|
|
||||||
|
|
||||||
@property
|
|
||||||
def zone(self):
|
|
||||||
return self.__zone
|
|
||||||
|
|
||||||
@zone.setter
|
|
||||||
def zone(self, zone):
|
|
||||||
if zone not in Scaleway.ZONES:
|
|
||||||
raise KeyError(f"{zone} is not a valid zone - must be one of {Scaleway.ZONES}")
|
|
||||||
self.__zone = zone
|
|
||||||
|
|
||||||
@property
|
|
||||||
def project_id(self):
|
|
||||||
return self.__project_id
|
|
||||||
|
|
||||||
def __url(self, item, id):
|
|
||||||
if self.__zone is None:
|
|
||||||
raise RuntimeError("zone must be set before making any API requests")
|
|
||||||
|
|
||||||
url = f"{Scaleway.API_ENDPOINT_BASE}/{self.__zone}"
|
|
||||||
|
|
||||||
if id == "products":
|
|
||||||
return f"{url}/products/{item}"
|
|
||||||
|
|
||||||
url = f"{url}/{item}"
|
|
||||||
if id is not None:
|
|
||||||
url = f"{url}/{id}"
|
|
||||||
|
|
||||||
return url
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def __check_status(type, url, rsp):
|
|
||||||
if (rsp.status_code // 100) != 2:
|
|
||||||
raise RuntimeError(
|
|
||||||
f"{type} {url} returned with status code {rsp.status_code}: {rsp.json()}")
|
|
||||||
|
|
||||||
def get(self, item, id=None):
|
|
||||||
url = self.__url(item, id)
|
|
||||||
r = requests.get(url, headers=self.__headers)
|
|
||||||
self.__check_status("GET", url, r)
|
|
||||||
return r.json()[item]
|
|
||||||
|
|
||||||
def get_by_name(self, item, name):
|
|
||||||
items = self.get(item)
|
|
||||||
return next((it for it in items if it["name"] == name), None)
|
|
||||||
|
|
||||||
def __post(self, url, data):
|
|
||||||
r = requests.post(url, headers=self.__headers, json=data)
|
|
||||||
self.__check_status("POST", url, r)
|
|
||||||
return r.json()
|
|
||||||
|
|
||||||
def post(self, item, data):
|
|
||||||
return self.__post(self.__url(item, None), data)
|
|
||||||
|
|
||||||
def post_action(self, item, id, action, data):
|
|
||||||
return self.__post(f"{self.__url(item, id)}/{action}", data)
|
|
||||||
|
|
||||||
def delete(self, item, id):
|
|
||||||
url = self.__url(item, id)
|
|
||||||
r = requests.delete(url, headers=self.__headers)
|
|
||||||
self.__check_status("DELETE", url, r)
|
|
||||||
|
|
||||||
|
|
||||||
def create_baldur(scaleway, args):
|
|
||||||
volume_size = args.volume_size
|
|
||||||
|
|
||||||
security_group = scaleway.get_by_name("security_groups", "baldur-security-group")
|
|
||||||
image = scaleway.get_by_name("images", "Debian Bullseye")
|
|
||||||
server_type = "PLAY2-PICO"
|
|
||||||
if server_type not in scaleway.get("servers", id="products"):
|
|
||||||
raise RuntimeError(f"{server_type} is not available in {scaleway.zone}")
|
|
||||||
|
|
||||||
response = scaleway.post("ips", data={"project": scaleway.project_id})
|
|
||||||
public_ip = response["ip"]
|
|
||||||
|
|
||||||
baldur = {
|
|
||||||
"name": "baldur",
|
|
||||||
"dynamic_ip_required": False,
|
|
||||||
"commercial_type": server_type,
|
|
||||||
"image": image["id"],
|
|
||||||
"volumes": {"0": {"size": int(volume_size * 1_000_000_000)}},
|
|
||||||
"enable_ipv6": False,
|
|
||||||
"public_ip": public_ip["id"],
|
|
||||||
"project": scaleway.project_id,
|
|
||||||
"security_group": security_group["id"],
|
|
||||||
}
|
|
||||||
|
|
||||||
response = scaleway.post("servers", data=baldur)
|
|
||||||
server = response["server"]
|
|
||||||
|
|
||||||
scaleway.post_action("servers", server["id"], "action", data={"action": "poweron"})
|
|
||||||
|
|
||||||
print("Baldur instance created:")
|
|
||||||
print(f" block volume size: {server['volumes']['0']['size']//1_000_000_000} GiB")
|
|
||||||
print(f" public ip address: {server['public_ip']['address']}")
|
|
||||||
|
|
||||||
|
|
||||||
def delete_baldur(scaleway, _):
|
|
||||||
server = scaleway.get_by_name("servers", "baldur")
|
|
||||||
if server is None:
|
|
||||||
raise RuntimeError(f"Baldur instance was not found in {scaleway.zone}")
|
|
||||||
ip = server["public_ip"]
|
|
||||||
|
|
||||||
scaleway.post_action("servers", server["id"], "action", data={"action": "terminate"})
|
|
||||||
scaleway.delete("ips", ip["id"])
|
|
||||||
|
|
||||||
|
|
||||||
if __name__ == "__main__":
|
|
||||||
parser = argparse.ArgumentParser(description="Create or delete the Baldur instance")
|
|
||||||
|
|
||||||
subparsers = parser.add_subparsers()
|
|
||||||
|
|
||||||
create_parser = subparsers.add_parser("create")
|
|
||||||
create_parser.add_argument("--volume-size", type=int, required=True,
|
|
||||||
help="Block volume size (in GiB) to create")
|
|
||||||
create_parser.set_defaults(func=create_baldur)
|
|
||||||
|
|
||||||
delete_parser = subparsers.add_parser("delete")
|
|
||||||
delete_parser.set_defaults(func=delete_baldur)
|
|
||||||
|
|
||||||
args = parser.parse_args()
|
|
||||||
|
|
||||||
scw_project_id = keyring.get_password("scaleway", "project_id")
|
|
||||||
scw_secret_key = keyring.get_password("scaleway", "secret_key")
|
|
||||||
|
|
||||||
scaleway = Scaleway(scw_project_id, scw_secret_key)
|
|
||||||
scaleway.zone = "fr-par-2"
|
|
||||||
|
|
||||||
args.func(scaleway, args)
|
|
Loading…
Reference in New Issue
Block a user