Data Logistics Service issueshttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues2023-10-04T15:07:50+02:00https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/88Check out dependency issues with airflow 2.7.12023-10-04T15:07:50+02:00Christian BoettcherCheck out dependency issues with airflow 2.7.1There seems to be a dependency issue with airflow 2.7.1 and mlflow 1.30.0
```plaintext
mlflow 1.30.0 depends on importlib-metadata!=4.7.0, <6 and >=3.7.0
opentelemetry-api 1.20.0 depends on importlib-metadata<7.0 and >=6.0
```
This is ...There seems to be a dependency issue with airflow 2.7.1 and mlflow 1.30.0
```plaintext
mlflow 1.30.0 depends on importlib-metadata!=4.7.0, <6 and >=3.7.0
opentelemetry-api 1.20.0 depends on importlib-metadata<7.0 and >=6.0
```
This is not present in airflow 2.7.0 and therefore not that urgent for now, but should probably be checked out some time (possibly after current deliverables are finished)https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/87ssh issues with b2share(-testing?) are resolved2023-04-28T08:16:12+02:00Jedrzej Rybickissh issues with b2share(-testing?) are resolvedthere are some issues with connecting to b2share-testing, certificate related. This ruins the upload_ and download flow pipelines. This is apparently not airflow specific, reproduce with
```
requests.get('https://b2share-testing.fz-juel...there are some issues with connecting to b2share-testing, certificate related. This ruins the upload_ and download flow pipelines. This is apparently not airflow specific, reproduce with
```
requests.get('https://b2share-testing.fz-juelich.de')
```
to get
```
SSLError: HTTPSConnectionPool(host='b2share-testing.fz-juelich.de', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: unable to get local issuer certificate (_ssl.c:1129)')))
```https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/86ssh2ssh pipeline2023-04-28T08:07:15+02:00Jedrzej Rybickissh2ssh pipelineRequired for ESM workflow. MOving the data between ZeusRequired for ESM workflow. MOving the data between Zeushttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/85License management2023-04-27T11:29:40+02:00Jedrzej RybickiLicense managementTo touch/transfer some of the data era3 a license is needed. It could be that this is a higher-level problem. The attribute might be stored in the Unity.To touch/transfer some of the data era3 a license is needed. It could be that this is a higher-level problem. The attribute might be stored in the Unity.https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/82Integration with Object Store is evaluated2023-02-01T09:19:22+01:00Jedrzej RybickiIntegration with Object Store is evaluatedA way to access storage resources at JSC is to use object store interface. This might be a better option as the UFTP #76
* easier to integrate (potentially just http for reading at least)
* more flexibleA way to access storage resources at JSC is to use object store interface. This might be a better option as the UFTP #76
* easier to integrate (potentially just http for reading at least)
* more flexiblehttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/81Cheksum in DataCat2022-10-20T09:24:55+02:00Jedrzej RybickiCheksum in DataCatstage-out could calculcate checksum and add it to DC recordstage-out could calculcate checksum and add it to DC recordhttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/80Checksum management2022-10-20T09:17:14+02:00Jedrzej RybickiChecksum managementFor the transfers a checksum is calculated and could be compared with something, but with what/For the transfers a checksum is calculated and could be compared with something, but with what/https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/78Generic directory 2 directory mapping is evaluated2022-10-11T09:03:44+02:00Jedrzej RybickiGeneric directory 2 directory mapping is evaluatedApparently there is a need to map/transform from one directory structure into another (earthquakes + tsunamis).
Could be an interesting idea to come up with generic solution for that. Probably something like template with regex?Apparently there is a need to map/transform from one directory structure into another (earthquakes + tsunamis).
Could be an interesting idea to come up with generic solution for that. Probably something like template with regex?https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/73OID management is refactored2022-10-20T09:26:04+02:00Jedrzej RybickiOID management is refactoredCurrent oid management is grown from just b2share (oid was pointing to b2share object) to datacat (oid points to datacat entries). This should be cleaned up now to use datacat and if not possible fall back to http-based transfer.
An ex...Current oid management is grown from just b2share (oid was pointing to b2share object) to datacat (oid points to datacat entries). This should be cleaned up now to use datacat and if not possible fall back to http-based transfer.
An example is already in docker_in_worker. A method should be created to include this in other pipelines.https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/69Visibility for non admin users of the DLS to be able to see the links to the ...2022-09-13T12:30:00+02:00Maria Petrova-El SayedVisibility for non admin users of the DLS to be able to see the links to the datacat and the eFlows websitehttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/67b2share community specific id changes are handled2022-08-23T10:46:07+02:00Jedrzej Rybickib2share community specific id changes are handledThe b2share metadata schema tends to change the community specific id which is bad for the current way of creating (draft) records. Needs to be fixed.The b2share metadata schema tends to change the community specific id which is bad for the current way of creating (draft) records. Needs to be fixed.https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/66Airflow for data replication use case is understood2022-09-29T10:02:31+02:00Jedrzej RybickiAirflow for data replication use case is understoodThere is a possibility to replicate data with airflow. Airflow can:
* move the data around,
* have some notion of replication policy (e.g. at least 3 copies are available at any time),
* perform periodic checks if the policy is fulfil...There is a possibility to replicate data with airflow. Airflow can:
* move the data around,
* have some notion of replication policy (e.g. at least 3 copies are available at any time),
* perform periodic checks if the policy is fulfilled or not (and react accordingly)
* inform the users about the state (new replicas)
* keep track of replicas in data catalog (list of urls)https://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/64DataCat integration v2 tested with b2share2022-07-28T09:13:56+02:00Jedrzej RybickiDataCat integration v2 tested with b2sharehttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/63webdav support implemented2022-09-29T11:00:50+02:00Jedrzej Rybickiwebdav support implementedFor b2drop integration webdav support is required.For b2drop integration webdav support is required.Jedrzej RybickiJedrzej Rybickihttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/54Reasons for unhealthy status2023-02-01T10:50:08+01:00Jedrzej RybickiReasons for unhealthy statushttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/51Fix old docker volumes breaking the full redeployment2022-07-12T12:46:59+02:00Christian BoettcherFix old docker volumes breaking the full redeploymentAs described in #47 The persistent docker volumes also cause docker to prematurely recreate the docker containers, which then expect certain files that do not yet exist
docker then creates empty volumes instead, which causes the airflow...As described in #47 The persistent docker volumes also cause docker to prematurely recreate the docker containers, which then expect certain files that do not yet exist
docker then creates empty volumes instead, which causes the airflow container to fail, prevents the copying from the proper files and therefore breaks the deployment
my current solution is do call "rm -rf ~/eflows-airflow" before restarting docker
this is no permanent solution, as it depends on the timing of docker checking for the files. Something that disables docker for w while would probably be a better solutionChristian BoettcherChristian Boettcherhttps://gitlab.jsc.fz-juelich.de/eflows4hpc-wp2/data-logistics-service/-/issues/37Deploy DLS on Kubernetes2022-07-22T09:33:13+02:00Maria Petrova-El SayedDeploy DLS on KubernetesMaria Petrova-El SayedMaria Petrova-El Sayed