Skip to content

Deployment

Service Deployment

The services on the virtual machine can be deployed using a single docker-compose file:

docker compose --env-file .env --env-file .development.env up -d

There are configuration parameters for all the services for production and develoment:

Production


Development

For development, it is useful to override some configuration:


Note: The lts-mock-volume is a local volume here and not the LTS share.

Prefect Deployment

Prefect is set up in a slightly non-standard way (with respect to their described use cases). There are two workers deployed (archival/retrieval) that mount the hosts Docker socket in order to create containers at runtime in which the flows run. The flows are baked into the containers and the code is not pulled from any repository (Prefect would allow to, for example, store the code in an S3 bucket). The ETHZ LTS volume is mounted in a Docker volume such that the runtime containers can mount those during startup.

Name Technology Description Endpoint
Prefect Server Workflow orchestration https://www.prefect.io http://localhost/prefect-ui/dashboard
Postgres Database Database for Prefect n/a
Prefect Worker https://docs.prefect.io/3.0/deploy/infrastructure-concepts/workers n/a
Runtime Container prefect-runtime.Dockerfile n/a

Prefect Server

In order to run Prefect server, variables, secrets and concurrency limits need to be configured.

Configuration

All of the configuration can be done by running

bash docker compose --env-file .env --env-file .development.env run --rm prefect-config

with the appropriate PREFECT_API_URL set.

Variables

Variables are used at runtime and are fetched from the server by the flow. External endpoints and other parameters of the flow belong here:

[archiver]
ARCHIVER_SCRATCH_FOLDER = "/tmp/scratch" #string
ARCHIVER_TARGET_SIZE_MB = 200            # target size of datablocks that are stored in the LTS

[lts]
LTS_STORAGE_ROOT = "/tmp/LTS"  # Root path where LTS share is mounted
LTS_FREE_SPACE_PERCENTAGE = 20 # Minimum free space percentage of the LTS before archiving task starts

[minio]
MINIO_REGION = "eu-west-1"                        # S3 region
MINIO_RETRIEVAL_BUCKET = "retrieval"              # S3 bucket where datasets are retrieved to
MINIO_LANDINGZONE_BUCKET = "landingzone"          # S3 bucket where datasets are uploaded to  
MINIO_STAGING_BUCKET = "staging"                  # S3 internally used bucket where datasets are staged before copying to LTS
MINIO_ENDPOINT = "scopem-openemdata.ethz.ch:9090" # S3 endpoint of storage server

[scicat]
SCICAT_API_PREFIX = "/api/v3/" # Route prefix of Scicat instance
Concurrrency Limits

There are certain sections of the code (tasks) that can only run in a limited manner concurrently (i.e. writing to the LTS), see https://docs.prefect.io/3.0/develop/task-run-limits#limit-concurrent-task-runs-with-tags.

LTS_FREE_LIMIT = 1
MOVE_TO_LTS_LIMIT = 2
VERIFY_LTS_LIMIT = 1
LTS_TO_RETRIEVAL_LIMIT = 1
Internal Secrets

Internal secrets can be created at deployment time.

# Postgres
echo "postgres_user" > .secrets/postgresuser.txt
openssl rand -base64 12 > .secrets/postgrespass.txt # creates random string
External Secrets

The Minio deployment might already provide its own secrets and can be added manually in the UI too.

# Minio
echo "minioadminuser" > .secrets/miniouser.txt
openssl rand -base64 12 > .secrets/miniopass.txt # creates random string

// TODO: Needed ?
# Github
echo "<github_user>" > .secrets/githubuser.txt
echo "<github_access_token>" > .secrets/githubpass.txt
Name Description
github-openem-username Username for Github container registry
github-openem-access-token Personal access token to Github container registry

Prefect Worker

Workers can only be deployed on a machine that has access to

  • the Prefect server (no authentication implemented in Prefect for on-premise deployement currently)
  • the S3 storage
  • the ETHZ LTS share (ip whitelisting within ETHZ network)

They can be started by the following command:

docker compose --env-file .env --env-file .development.env up -d prefect-archival-worker
docker compose --env-file .env --env-file .development.env up -d prefect-retrieval-worker

Note: due to a bug in Prefect the workers concurrency limit needs to be set manually in the UI.

Flows

The flows can be deployed using a container:

docker compose --env-file .env --env-file .development.env run --rm prefect-flows-deployment

This deploys the flows as defined in the prefect.yaml and requires the secrets set up in the previous step.