Skip to content

Deployment

arch

Service Deployment

The services on the virtual machine can be deployed using a single docker-compose file:

docker compose --env-file .env --env-file .development.env up -d

There are configuration parameters for all the services for production and develoment:

Production

### Global
# Host where this instance is hosted
HOST=scopem-openem.ethz.ch 
ENVIRONMENT=prod
# Certificate
CERTIFICATE_FILE=.certs/cert_bundle.pem
# Private Key
CERTIFICATE_KEY_FILE=.certs/cert.key

### Identity Provider / Broker (Keycloak)
IDP_URL=https://kc.psi.ch
IDP_USERNAME=scopem-archiver-service
IDP_REALM=awi
IDP_AUDIENCE=account
IDP_CLIENT_ID=scopem-archiver-service-api
IDP_CLIENT_SECRET_FILE=./.secrets/idpclientsecret_prod.txt
IDP_PASSWORD_FILE=./.secrets/idppassword_prod.txt

### Archiver Service API
# Image used for backend service
OPENEM_BACKEND_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-api
OPENEM_IMAGE_TAG=latest
# Archiver Root Folder
ARCHIVER_SCRATCH_FOLDER=/storage
# Backend server api root path
API_ROOT_PATH=/archiver/api/v1


#### Minio
S3_REGION="eu-west-1"
S3_ENDPOINT="sp109.ethz.ch:18000"
S3_EXTERNAL_ENDPOINT=scopem-openem.ethz.ch
S3_TOTAL_LANDING_SPACE_TB=100

#### PREFECT
# Prefect version used in all images
PREFECT_VERSION=3.7.2-python3.13
# Logging level
PREFECT_LOGGING_LEVEL=INFO
# Image name for containers used to execute flows
PREFECT_RUNTIME_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-workflow
# Image name for configuration container
PREFECT_CONFIG_IMAGE_NAME=ghcr.io/swissopenem/scopemarchiver-archiver-service-config
# Working directory of archiver
PREFECT_ARCHIVER_HOST_SCRATCH=/mnt/openemdata/scratch
# Production Prefect job template
PREFECT_JOB_TEMPLATE=prefect-jobtemplate-prod.json
# Workpool name for archiver jobs
PREFECT_ARCHIVAL_WORKPOOL_NAME=archival-docker-workpool
# Workpool name for retrieval jobs
PREFECT_RETRIEVAL_WORKPOOL_NAME=retrieval-docker-workpool

PREFECT_VARS_FILE=../backend/prefect/vars_prod.toml
### Authentik
# Use `AUTH_MIDDLEWARE=authentik` to protect access to dashboards
AUTH_MIDDLEWARE=authentik

AUTHENTIK_HOST=https://authentik.ethz.ch
# Check whether the Authentik infrastructure uses a self-signed certificate (true) or not
AUTHENTIK_INSECURE=true

### Scicat
SCICAT_ENDPOINT=https://dacat.psi.ch
SCICAT_API_PREFIX=/api/v3
SCICAT_USER_FILE=.secrets/scicatuser_prod.txt
SCICAT_PASSWORD_FILE=.secrets/scicatpass_prod.txt
SCICAT_INGESTOR_GROUPS=ethz-scopem;ethz-scopem-ops

Development

For development, it is useful to override some configuration:


Note: The lts-mock-volume is a local volume here and not the LTS share.

Prefect Deployment

Prefect is set up in a slightly non-standard way (with respect to their described use cases). There are two workers deployed (archival/retrieval) that mount the hosts Docker socket in order to create containers at runtime in which the flows run. The flows are baked into the containers and the code is not pulled from any repository (Prefect would allow to, for example, store the code in an S3 bucket). The ETHZ LTS volume is mounted in a Docker volume such that the runtime containers can mount those during startup.

prefect

Name Technology Description Endpoint
Prefect Server Workflow orchestration https://www.prefect.io http://localhost/prefect-ui/dashboard
Postgres Database Database for Prefect n/a
Prefect Worker https://docs.prefect.io/3.0/deploy/infrastructure-concepts/workers n/a
Runtime Container runtime.Dockerfile n/a

Prefect Server

In order to run Prefect server, variables, secrets and concurrency limits need to be configured.

Configuration

All of the configuration can be done by running

  docker compose --env-file .env --env-file .development.env run --rm prefect-config

with the appropriate PREFECT_API_URL set.

Variables

Variables are used at runtime and are fetched from the server by the flow. External endpoints and other parameters of the flow belong here:


Concurrrency Limits

There are certain sections of the code (tasks) that can only run in a limited manner concurrently (i.e. writing to the LTS), see https://docs.prefect.io/3.0/develop/task-run-limits#limit-concurrent-task-runs-with-tags.

LTS_WRITE_LIMIT = 4
LTS_READ_LIMIT = 4
Internal Secrets

Internal secrets can be created at deployment time.

# Postgres
echo "postgres_user" > .secrets/postgresuser.txt
openssl rand -base64 12 > .secrets/postgrespass.txt # creates random string
External Secrets

The Minio deployment might already provide its own secrets and can be added manually in the UI too.

# Minio
echo "minioadminuser" > .secrets/miniouser.txt
openssl rand -base64 12 > .secrets/miniopass.txt # creates random string

// TODO: Needed ?
# Github
echo "<github_user>" > .secrets/githubuser.txt
echo "<github_access_token>" > .secrets/githubpass.txt
Name Description
github-openem-username Username for Github container registry
github-openem-access-token Personal access token to Github container registry

Prefect Worker

Workers can only be deployed on a machine that has access to

  • the Prefect server (no authentication implemented in Prefect for on-premise deployement currently)
  • the S3 storage
  • the ETHZ LTS share (ip whitelisting within ETHZ network)

They can be started by the following command:

docker compose --env-file .env --env-file .development.env up -d prefect-archival-worker
docker compose --env-file .env --env-file .development.env up -d prefect-retrieval-worker

Note: due to a bug in Prefect the workers concurrency limit needs to be set manually in the UI.

Flows

The flows can be deployed using a container:

docker compose --env-file .env --env-file .development.env run --rm prefect-flows-deployment

This deploys the flows as defined in the prefect.yaml and requires the secrets set up in the previous step.