Update README
This commit is contained in:
parent
6954490bf4
commit
97ea02c904
212
README.md
212
README.md
@ -1,18 +1,10 @@
|
||||
# The Ansible Edda
|
||||
|
||||
Ansible playbooks for provisioning The Nine Worlds.
|
||||
Ansible playbooks for provisioning **The Nine Worlds**.
|
||||
|
||||
## Secrets vault
|
||||
## Running the playbooks
|
||||
|
||||
- Encrypt with: ```ansible-vault encrypt vault.yml```
|
||||
- Decrypt with: ```ansible-vault decrypt secrets.yml```
|
||||
- Encrypt all `vault.yml` in a directory with: ```ansible-vault encrypt directory/**/vault.yml```
|
||||
- Decrypt all `vault.yml` in a directory with: ```ansible-vault decrypt directory/**/vault.yml```
|
||||
- Run a playbook with ```ansible-playbook --vault-id @prompt playbook.yml```
|
||||
|
||||
## The Nine Worlds
|
||||
|
||||
The main entrypoint for The Nine Worlds is [`main.yml`](main.yml).
|
||||
The main entrypoint for **The Nine Worlds** is [`main.yml`](main.yml).
|
||||
|
||||
### Keyring integration
|
||||
|
||||
@ -38,19 +30,14 @@ The inventory files are split into [`production`](production) and [`testing`](te
|
||||
|
||||
To run the `main.yml` playbook on production hosts:
|
||||
``` sh
|
||||
ansible-playbook main.yml -i production
|
||||
ansible-playbook main.yml -i inventory/production
|
||||
```
|
||||
|
||||
To run the `main.yml` playbook on production hosts:
|
||||
To run the `main.yml` playbook on testing hosts:
|
||||
``` sh
|
||||
ansible-playbook main.yml -i testing
|
||||
ansible-playbook main.yml -i inventory/testing
|
||||
```
|
||||
|
||||
### Testing virtual machines
|
||||
|
||||
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
||||
`scripts/testing/vmgr.py`.
|
||||
|
||||
### Playbooks
|
||||
|
||||
The Ansible Edda playbook is composed of smaller [`playbooks`](playbooks). To run a single playbook,
|
||||
@ -69,156 +56,107 @@ ansible-playbook main.yml --tags "system"
|
||||
|
||||
### Roles
|
||||
|
||||
Playbooks are composed of roles defined in the `roles` directory,
|
||||
[`playbooks/roles`](playbooks/roles).
|
||||
|
||||
To play only a specific role, e.g. `system/base` in the playbook `system`, run:
|
||||
|
||||
``` sh
|
||||
ansible-playbook playbooks/system.yml --tags "system:base"
|
||||
```
|
||||
|
||||
Or from the main playbook:
|
||||
|
||||
``` sh
|
||||
ansible-playbook main.yml --tags "system:base"
|
||||
```
|
||||
|
||||
### Role sub-tasks
|
||||
|
||||
Some roles are split into smaller groups of tasks. This can be checked by looking at the
|
||||
`tasks/main.yml` file of a role, e.g.
|
||||
[`playbooks/roles/system/base/tasks/main.yml`](playbooks/roles/system/base/tasks/main.yml).
|
||||
|
||||
To play only a particular group within a role, e.g. `sshd` in `base` of `system`, run:
|
||||
Playbooks are composed of roles defined in the `roles` submodule, [`roles`](roles), and the
|
||||
`playbooks/roles` directory, [`playbooks/roles`](playbooks/roles).
|
||||
|
||||
To play a specific role, e.g., `system/base/sshd` in the playbook `system`, run:
|
||||
``` sh
|
||||
ansible-playbook playbooks/system.yml --tags "system:base:sshd"
|
||||
```
|
||||
|
||||
Or from the main playbook:
|
||||
To play all roles from a specific group, e.g., `system/base` in the playbook `system`, run:
|
||||
``` sh
|
||||
ansible-playbook playbooks/system.yml --tags "system:base"
|
||||
```
|
||||
|
||||
Some roles, e.g., `services/setup/user`, have sub-tasks which can also be invoked individually. To
|
||||
find the relevant tag, see the role's `main.yml`.
|
||||
|
||||
In all cases, the roles can be also invoked from the main playbook:
|
||||
``` sh
|
||||
ansible-playbook main.yml --tags "system:base:sshd"
|
||||
ansible-playbook main.yml --tags "system:base"
|
||||
```
|
||||
|
||||
## Testing virtual machines
|
||||
|
||||
The scripts for starting, stopping, and reverting the testing virtual machines is located in
|
||||
`scripts/testing/vmgr.py`.
|
||||
|
||||
## Managing backup buckets
|
||||
|
||||
The `scripts/restic/restic.py` script provides a wrapper around restic to manage the backup buckets.
|
||||
The script collects the credentials from the OS keyring and constructs the restic command with the
|
||||
correct endpoint. It allows the user to focus on the actual command to be executed rather than
|
||||
authentication and bucket URLs.
|
||||
|
||||
The `scripts/restic/restic.py` requires the following entries in the keyring:
|
||||
- `scaleway`: `access_key` (Scaleway project ID),
|
||||
- `scaleway`: `secret_key` (Scaleway secret key),
|
||||
- `restic`: `password`.
|
||||
|
||||
The easiest way to set these values is with Python's `keyring.set_password`.
|
||||
|
||||
## Testing backups
|
||||
|
||||
Before testing the backups, you may want to shut `yggdrasil` down for extra confidence that it is
|
||||
not being accessed/modified during this process. It is easy to access `yggdrasil` by accident if
|
||||
`/etc/hosts` is not modified in the test VM, something that is easy to forget.
|
||||
### Setting up baldur on yggdrasil
|
||||
|
||||
### Baldur on Scaleway
|
||||
|
||||
1. Create `baldur` by running:
|
||||
```sh
|
||||
python scripts/scaleway/baldur.py create --volume-size <size-in-GB>
|
||||
```
|
||||
Pick a volume size that's larger than what `yggdrasil` estimates for
|
||||
`rpool/var/lib/yggdrasil/data`.
|
||||
2. When done destroy `baldur` by running:
|
||||
```sh
|
||||
python scripts/scaleway/baldur.py delete
|
||||
```
|
||||
|
||||
### Baldur on Yggdrasil
|
||||
|
||||
1. Create a VM on `yggdrasil` and install the same OS that is running on `yggdrasil`.
|
||||
- Install the OS on a zvol on `rpool`.
|
||||
- If the same VM is to be used for testing, a GUI is helpful.
|
||||
- Prepare a zvol on `hpool` of size that's larger than what `yggdrasil` estimates for
|
||||
`rpool/var/lib/the-nine-worlds/data` and mount at `/var/lib/the-nine-worlds/data`.
|
||||
- Create non-root user `wojtek` with `sudo` privileges.
|
||||
2. Configure SSH to use `yggdrasil` as a jump server.
|
||||
1. Create the zvol `rpool/var/lib/libvirt/images/baldur` for the testing OS.
|
||||
2. Create the zvol `hpool/baldur` for the backup data under test. It should have a capacity that's
|
||||
larger than what `yggdrasil` estimates for `rpool/var/lib/the-nine-worlds/data` (excluding
|
||||
datasets that are not backed up to the cloud).
|
||||
3. Set `refreserv=0` on the zvols to make snapshots take less space.
|
||||
- `zfs set refreserv=0 tank/home/ahrens`
|
||||
4. Use ZFS for snapshots/roolback of the zvols.
|
||||
- `zfs snapshot tank/home/ahrens@friday`
|
||||
- `zfs rollback tank/home/ahrens@friday`
|
||||
5. Service testing can then be done directly from the VM. To achieve that `/etc/hosts` needs to be
|
||||
set to directly point at the right proxy server, e.g., `10.66.3.8`, not `localhost`.
|
||||
- `zfs set refreserv=0 rpool/var/lib/libvirt/images/baldur`
|
||||
- `zfs set refreserv=0 hpool/baldur`
|
||||
4. Install the same OS that is running on `yggdrasil`, but with a DE, on
|
||||
`rpool/var/lib/libvirt/images/baldur` with `hpool/baldur` mounted within at
|
||||
`/var/lib/the-nine-worlds/data`.
|
||||
5. Create non-root user `wojtek` with `sudo` privileges.
|
||||
6. Configure SSH from the workstation to use `yggdrasil` as a jump server.
|
||||
7. Use ZFS for snapshots/rollback of the zvols.
|
||||
- `zfs snapshot rpool/var/lib/libvirt/images/baldur@start`
|
||||
- `zfs snapshot hpool/baldur@start`
|
||||
|
||||
### Test
|
||||
### Provision baldur
|
||||
|
||||
1. Provision `baldur` by running
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/baldur.yml
|
||||
```
|
||||
2. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
||||
2. Update `/etc/the-nine-worlds/resolv.conf` to point at a public DNS resolver, e.g., `1.1.1.1`.
|
||||
Name resolution failures can cause containers to fail.
|
||||
3. Restore all the backups by ssh'ing into `baldur` and running (as root):
|
||||
```sh
|
||||
/usr/local/sbin/restic-batch --config-dir /etc/the-nine-worlds/restic-batch.d restore
|
||||
```
|
||||
3. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
||||
4. Once restore has completed, `chown -R <user>:<user>` all the restored directories in
|
||||
`/var/lib/the-nine-worlds/data`. Restic restores the UID information of the host from which the
|
||||
backup was performed which may not match that of the new target machine. Note that permissions
|
||||
and ownership are restored as a second step once all the content is restored. Therefore, the
|
||||
files will list `root` as owner during the restoration.
|
||||
4. Start all the pod services with:
|
||||
5. Start all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_start.yml
|
||||
```
|
||||
Give them some time to download all the images and start.
|
||||
5. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
||||
interfaces. If necessary restart the affected pod. Sometimes they fail to start (presumably due
|
||||
to issues related to limited CPU and RAM).
|
||||
6. Boot into a test VM. Ideally, one installed onto a virtual disk since the live system might not
|
||||
have enough space. A VM is used to make sure that none of the services on the host workstation
|
||||
connect to `baldur` by accident.
|
||||
7. Modify `/etc/hosts` in the VM to point at `baldur` for all relevant domains.
|
||||
8. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
||||
6. Once the CPU returns to idling check the state of all the pod services and their `veth`
|
||||
interfaces. If necessary restart the affected pod, some containers fail to start up if the
|
||||
database takes too long to come online.
|
||||
|
||||
### Testing the backups
|
||||
|
||||
1. Log into the `baldur`. Testing from a VM (as opposed to a regular workstation) is important to
|
||||
prevent live applications from accidentally connecting to `baldur`.
|
||||
2. Modify `/etc/hosts` in the VM to point at `rproxy` (e.g., `10.66.3.8`) for all relevant domains.
|
||||
3. Test each service manually one by one. Use the Flagfox add-on to verify that you are indeed
|
||||
connecting to `baldur`.
|
||||
- Some containers fail to start up if the database takes too long to come online. In that case
|
||||
restart the container.
|
||||
- Some containers fail to start up if they cannot make DNS queries. Note that `192.168.0.0/16` is
|
||||
blocked by firewall rules. If `/etc/the-nine-worlds/resolv.conf` points at a DNS resolved at
|
||||
such an address all DNS queries will fail. Simply update `resolv.conf` to e.g. `1.1.1.1`.
|
||||
9. Stop all the pod services with:
|
||||
|
||||
### Cleaning up
|
||||
|
||||
1. Stop all the pod services with:
|
||||
```sh
|
||||
ansible-playbook --vault-id @vault-keyring-client.py -i inventory/baldur_production playbooks/services_stop.yml
|
||||
```
|
||||
|
||||
## Music organisation
|
||||
|
||||
The `playbooks/music.yml` playbook sets up tools and configuration for organising music. The process
|
||||
is manual though. The steps for adding a new CD.
|
||||
|
||||
All steps below are to be executed as the `music` user.
|
||||
|
||||
### Note on tagging
|
||||
|
||||
* For live albums add "YYYY-MM-DD at Venue, City, Country" in the "Subtitle" tag.
|
||||
* For remasters use original release tags and add "YYYY Remaster" in the "Subtitle" tag.
|
||||
|
||||
### Ripping a CD
|
||||
|
||||
1. Use a CD ripper and rip the CD to `/var/lib/yggdrasil/home/music/rip` using flac encoding.
|
||||
2. Samba has been set up to give Windows access to the above directory. Therefore, CD rippers
|
||||
available only for Windows can also be used, e.g. dBpoweramp.
|
||||
|
||||
### Import new music
|
||||
|
||||
1. Run `beet import /var/lib/yggdrasil/home/music/rip`. This will move the music files to
|
||||
`/var/lib/yggdrasil/data/music/collection`.
|
||||
2. Run `beet convert -a <match>`, where `<match>` is used to narrow down to new music only. This
|
||||
will convert the flac files into mp3 files for sharing via Nextcloud.
|
||||
3. Run `nextcloud-upload /var/tmp/music/mp3/<artist>` for every artist to upload to Nextcloud.
|
||||
4. Remove the `/var/tmp/music/mp3/<artist>` directory.
|
||||
|
||||
#### Collections
|
||||
|
||||
Every track has a `compilation` tag at track-level as well as at album-level (at least in Beets). To
|
||||
label the album as a compilation for sorting purposes, run `beet modify -a <album> comp=True`.
|
||||
|
||||
### Archive music
|
||||
|
||||
#### From rip
|
||||
|
||||
1. Run `beet --config .config/beets/archive.yaml import --move /var/lib/yggdrasil/home/music/rip`.
|
||||
This will move the music files to `/var/lib/yggdrasil/data/music/archive`.
|
||||
|
||||
#### From collection
|
||||
|
||||
1. Run `beet --config .config/beets/archive.yaml import
|
||||
/var/lib/yggdrasil/data/music/collection/<artist>/<album>`. This will copy the music files to
|
||||
`/var/lib/yggdrasil/data/music/archive`.
|
||||
2. Run `beet remove -d -a "album:<album>"`. This will remove the music files from the collection.
|
||||
2. Delete the VM and the two zvols:
|
||||
- `rpool/var/lib/libvirt/images/baldur`,
|
||||
- `hpool/baldur`.
|
||||
|
@ -1,146 +0,0 @@
|
||||
import argparse
|
||||
import keyring
|
||||
import requests
|
||||
|
||||
|
||||
class Scaleway:
|
||||
API_ENDPOINT_BASE = "https://api.scaleway.com/instance/v1/zones"
|
||||
ZONES = [
|
||||
"fr-par-1", "fr-par-2", "fr-par-3",
|
||||
"nl-ams-1", "nl-ams-2",
|
||||
"pl-waw-1", "pl-waw-2",
|
||||
]
|
||||
|
||||
def __init__(self, project_id, secret_key):
|
||||
self.__zone = None
|
||||
self.__project_id = project_id
|
||||
self.__headers = {"X-Auth-Token": secret_key}
|
||||
|
||||
@property
|
||||
def zone(self):
|
||||
return self.__zone
|
||||
|
||||
@zone.setter
|
||||
def zone(self, zone):
|
||||
if zone not in Scaleway.ZONES:
|
||||
raise KeyError(f"{zone} is not a valid zone - must be one of {Scaleway.ZONES}")
|
||||
self.__zone = zone
|
||||
|
||||
@property
|
||||
def project_id(self):
|
||||
return self.__project_id
|
||||
|
||||
def __url(self, item, id):
|
||||
if self.__zone is None:
|
||||
raise RuntimeError("zone must be set before making any API requests")
|
||||
|
||||
url = f"{Scaleway.API_ENDPOINT_BASE}/{self.__zone}"
|
||||
|
||||
if id == "products":
|
||||
return f"{url}/products/{item}"
|
||||
|
||||
url = f"{url}/{item}"
|
||||
if id is not None:
|
||||
url = f"{url}/{id}"
|
||||
|
||||
return url
|
||||
|
||||
@staticmethod
|
||||
def __check_status(type, url, rsp):
|
||||
if (rsp.status_code // 100) != 2:
|
||||
raise RuntimeError(
|
||||
f"{type} {url} returned with status code {rsp.status_code}: {rsp.json()}")
|
||||
|
||||
def get(self, item, id=None):
|
||||
url = self.__url(item, id)
|
||||
r = requests.get(url, headers=self.__headers)
|
||||
self.__check_status("GET", url, r)
|
||||
return r.json()[item]
|
||||
|
||||
def get_by_name(self, item, name):
|
||||
items = self.get(item)
|
||||
return next((it for it in items if it["name"] == name), None)
|
||||
|
||||
def __post(self, url, data):
|
||||
r = requests.post(url, headers=self.__headers, json=data)
|
||||
self.__check_status("POST", url, r)
|
||||
return r.json()
|
||||
|
||||
def post(self, item, data):
|
||||
return self.__post(self.__url(item, None), data)
|
||||
|
||||
def post_action(self, item, id, action, data):
|
||||
return self.__post(f"{self.__url(item, id)}/{action}", data)
|
||||
|
||||
def delete(self, item, id):
|
||||
url = self.__url(item, id)
|
||||
r = requests.delete(url, headers=self.__headers)
|
||||
self.__check_status("DELETE", url, r)
|
||||
|
||||
|
||||
def create_baldur(scaleway, args):
|
||||
volume_size = args.volume_size
|
||||
|
||||
security_group = scaleway.get_by_name("security_groups", "baldur-security-group")
|
||||
image = scaleway.get_by_name("images", "Debian Bullseye")
|
||||
server_type = "PLAY2-PICO"
|
||||
if server_type not in scaleway.get("servers", id="products"):
|
||||
raise RuntimeError(f"{server_type} is not available in {scaleway.zone}")
|
||||
|
||||
response = scaleway.post("ips", data={"project": scaleway.project_id})
|
||||
public_ip = response["ip"]
|
||||
|
||||
baldur = {
|
||||
"name": "baldur",
|
||||
"dynamic_ip_required": False,
|
||||
"commercial_type": server_type,
|
||||
"image": image["id"],
|
||||
"volumes": {"0": {"size": int(volume_size * 1_000_000_000)}},
|
||||
"enable_ipv6": False,
|
||||
"public_ip": public_ip["id"],
|
||||
"project": scaleway.project_id,
|
||||
"security_group": security_group["id"],
|
||||
}
|
||||
|
||||
response = scaleway.post("servers", data=baldur)
|
||||
server = response["server"]
|
||||
|
||||
scaleway.post_action("servers", server["id"], "action", data={"action": "poweron"})
|
||||
|
||||
print("Baldur instance created:")
|
||||
print(f" block volume size: {server['volumes']['0']['size']//1_000_000_000} GiB")
|
||||
print(f" public ip address: {server['public_ip']['address']}")
|
||||
|
||||
|
||||
def delete_baldur(scaleway, _):
|
||||
server = scaleway.get_by_name("servers", "baldur")
|
||||
if server is None:
|
||||
raise RuntimeError(f"Baldur instance was not found in {scaleway.zone}")
|
||||
ip = server["public_ip"]
|
||||
|
||||
scaleway.post_action("servers", server["id"], "action", data={"action": "terminate"})
|
||||
scaleway.delete("ips", ip["id"])
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Create or delete the Baldur instance")
|
||||
|
||||
subparsers = parser.add_subparsers()
|
||||
|
||||
create_parser = subparsers.add_parser("create")
|
||||
create_parser.add_argument("--volume-size", type=int, required=True,
|
||||
help="Block volume size (in GiB) to create")
|
||||
create_parser.set_defaults(func=create_baldur)
|
||||
|
||||
delete_parser = subparsers.add_parser("delete")
|
||||
delete_parser.set_defaults(func=delete_baldur)
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
scw_project_id = keyring.get_password("scaleway", "project_id")
|
||||
scw_secret_key = keyring.get_password("scaleway", "secret_key")
|
||||
|
||||
scaleway = Scaleway(scw_project_id, scw_secret_key)
|
||||
scaleway.zone = "fr-par-2"
|
||||
|
||||
args.func(scaleway, args)
|
Loading…
Reference in New Issue
Block a user