Skip to content
Snippets Groups Projects

DataCatalog

This is Data Catalog for eFlows4HPC project

Find architecture in arch folder.

API-Server for the Data Catalog

This part is the the API-server for the Data Catalog, which will provide the backend functionality.

It is implemented via fastAPI and provides an api documentation via openAPI.

For deployment via docker, a docker image is included.

Security

Certain operations will only be possible, if the request is authenticated. The API has an endpoint at /token where a username/password login is possible. The endpoint will return a token, which is valid for 1 hour. This token ahs to be provided with every api call that requires authentication. Currently, these calls are GET /me - PUT /dataset - PUT /dataset/dataset-id - DELETE /dataset/dataset-id. The passwords are stored as bcrypt hashes and are not visible to anyone.

A CLI is provided for server admins to add new users. It will soon be extended to allow direct hash entry, so that the user does not have to provide their password in clear text.

For testing, a default userdb.json is provided with a single user "testuser" with the password "test".

API Documentation

If the api-server is running, you can see the documentation at <server-url>/docs or <server-url>/redoc.

These pages can also be used as a clunky frontend, allowing the authentication and execution of all api functions.

Running without docker

First ensure that your python version is 3.6 or newer.

Then, if they are not yet installed on your machine, install the requirements via pip:

pip install -r requirements.txt

To start the server, run

uvicorn apiserver:app --reload --reload-dir apiserver

while in the project root directory.

Without any other options, this starts your server on <localhost:8000>. The --reload --reload-dir apiserver options ensure, that any changes to files in the apiserver-directory will cause an immediate reload of the server, which is especially useful during development. If this is not required, just don't include the options.

More information about uvicorn settings (including information about how to bind to other network interfaces or ports) can be found here.

Testing

First ensure that the pytest package is installed (It is included in the requirements.txt).

Tests are located in the apiserver_tests directory. They can be executed by simply running pytest while in the project folder.

If more test-files should be added, they should be named with a test_ prefix and put into a similarily named folder, so that they can be auto-detected.

The context.py file helps with importing the apiserver-packages, so that the tests function independent of the local python path setup.

Using the docker image

Building the docker image

To build the docker image of the current version, simply run

docker build -t datacatalog-apiserver ./apiserver

while in the project root directory.

datacatalog-apiserver is a local tag to identify the built docker image. You can change it if you want.

Running the docker image

To run the docker image in a local container, run

docker run -d --name <container name> -p 127.0.0.1:<local_port>:80 datacalog-apiserver

<container name> is the name of your container, that can be used to refer to it with other docker commands.

<local_port> is the port of your local machine, which will be forwarded to the docker container. For example, if it is set to 8080, you will be able to reach the api-server at http://localhost:8080.

Stopping the docker image

To stop the docker image, run

docker stop <container name>

Note, that this will only stop the container, and not delete it fully. To do that, run

docker rm <container name>

For more information about docker, please see the docker docs