... | ... | @@ -3,41 +3,54 @@ |
|
|
### Transferable Deep Learning for fast COVID X-Ray detection and explainable diagnostics
|
|
|
|
|
|
Author: Jenia Jitsev (JJ)
|
|
|
(Helmholtz AI Local "Information", Juelich Supercomputing Center (JSC))
|
|
|
(Helmholtz AI Local "Information", Juelich Supercomputing Center (JSC)) \
|
|
|
Initiated: April 2020
|
|
|
|
|
|
Further Contributors: Mehdi Cherti (MC) (Helmholtz AI HLST, JSC)
|
|
|
|
|
|
#### Initiative Overview
|
|
|
#### Initiative Activities Overview
|
|
|
|
|
|
* [Overview - Description of available models, codes and datasets](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_public/sharedspace/playground/covid_xray_deeplearning/wiki/-/blob/master/Description.md)
|
|
|
* Spin-off project in frame of Helmholtz Information & Data Science Academy ([HIDA](https://www.helmholtz-hida.de/)) [Israel Exchange Program](https://idsi.net.technion.ac.il/call-for-student-helmholtz-israel-virtual-exchange-program/) (application deadline: 07.05.2021)
|
|
|
- [Large Scale Transfer Learning applied to X-Ray based COVID-19 diagnostics](https://idsi.net.technion.ac.il/12-large-scale-supervised-and-unsupervised-deep-learning-for-fast-and-robust-transfer-on-medical-x-ray-imaging-datasets-applied-to-covid-19-diagnostics/)
|
|
|
* Spin-off project in frame of *Helmholtz Information & Data Science Academy* ([HIDA](https://www.helmholtz-hida.de/)) and *Israel Data Science Initiative* ([IDSI](https://idsi.net.technion.ac.il/)) [Israel Exchange Program](https://idsi.net.technion.ac.il/call-for-student-helmholtz-israel-virtual-exchange-program/) (application deadline: 07.05.2021)
|
|
|
- [Large-Scale Transfer Learning applied to X-Ray based COVID-19 diagnostics](https://idsi.net.technion.ac.il/12-large-scale-supervised-and-unsupervised-deep-learning-for-fast-and-robust-transfer-on-medical-x-ray-imaging-datasets-applied-to-covid-19-diagnostics/)
|
|
|
* COVIDNetX provides a dedicated challenge for [Juelich Data Challenges](https://data-challenges.fz-juelich.de) : [COVIDNetX Challenge](https://data-challenges.fz-juelich.de/web/challenges/challenge-page/83/overview)
|
|
|
- Hackathon carried out by JULAIN Network and Helmholtz AI : [Juelich Challenges Hackathon](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/-/wikis/Juelich-Challenges-Hackathon)
|
|
|
- COVIDNetX Team Hackathon [Results Report](https://gitlab.version.fz-juelich.de/MLDL_FZJ/MLDL_FZJ_Wiki/-/blob/master/files/21-03-16-JuelichChallenges_Hackathon/21-03-19-results_team_covidnetx.pdf), [Code Repository](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_public/sharedspace/jucha/covidnetx-challenge/team_covidnetx/)
|
|
|
|
|
|
#### Resources Overview
|
|
|
|
|
|
* For model training and further experiments, **computing budget is available**,
|
|
|
- **UPDATE**: 30.10.2020 - COVIDNetX computational time application granted for JUWELS Booster (ca. **3600 GPUs** !)
|
|
|
- Grant title: *"Large-Scale Advanced Deep Transfer Learning for Fast, Robust and Affordable COVID-19 X-Ray Diagnostics"*
|
|
|
- [COVIDNetX Compute Grant Project Wiki](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_internal/superhaicu/projects/covidnetx/gsc_grant_21/wiki/-/wikis/home) (intern)
|
|
|
- Abstract: "X-Ray imaging based diagnostics offers an affordable, widely available and easily deployable alternative for screening of COVID-19 disease caused by the new coronavirus, the SARS-CoV-2. However, large-scale screening of substantial numbers of images under time pressure may cause human based errors and especially in remote locations without enough skilled personal and specialized physicians, both reliability of diagnostics and its rapid execution can be severely affected, corroborating the results. In this project, we aim on establishing advanced deep transfer learning approaches to enable robust and fast X-Ray based diagnostic tools independent of location and local resources. **Large-scale generic models pre-trained in HPC**, **quickly adaptable to local demands via transfer learning**, can be deployed as **compact, robust, fast and low cost diagnostic tools**, assisting available or even replacing missing medical personal, thus allowing screening in yet unprecedented speed and quality."
|
|
|
- grant period 01.11.2020 - 31.10.2021
|
|
|
- In the granting period we aim to lay grounds for different forms of advanced distributed deep transfer learning that is both high performant and efficient, offering low cost transfer to different domains and tasks, with **focus of transfer on small size COVID-19 X-Ray lung images datasets after pre-training on publically available large scale natural image or medical image datasets**. We will use supervised, unsupervised and NAS based transfer learning techniques to evaluate transfer quality.
|
|
|
- grant period 01.11.2020 - 31.10.2021
|
|
|
- In the granting period we aim to lay grounds for different forms of advanced distributed deep transfer learning that is both high performant and efficient, offering low cost transfer to different domains and tasks, with **focus of transfer on small size COVID-19 X-Ray lung images datasets after pre-training on publically available large scale natural image or medical image datasets**. We will use supervised, unsupervised and NAS based transfer learning techniques to evaluate transfer quality.
|
|
|
- [JUWELS Booster Hardware Specs in detail](https://apps.fz-juelich.de/jsc/hps/juwels/booster-overview.html)
|
|
|
- on JSC's JUSUF machine (up to 61 nodes with 1x V100 GPU)
|
|
|
- granted on 01.05.2020, until 31.01.2021 for initial project phase
|
|
|
- [JUSUF Hardware Specs in detail](https://www.fz-juelich.de/ias/jsc/EN/Expertise/Supercomputers/JUSUF/Configuration/Configuration_node.html)
|
|
|
- [JUSUF Hardware Specs in detail](https://apps.fz-juelich.de/jsc/hps/jusuf/cluster/configuration.html)
|
|
|
- For collaboration and access to computing resources, please contact Jenia Jitsev (j.jitsev@fz-juelich.de), Mehdi Cherti (m.cherti@fz-juelich.de), or Alex Strube (a.strube@fz-juelich.de)
|
|
|
- Collaborating partners will also gain access to common code and dataset repository
|
|
|
|
|
|
#### Initiative Aims
|
|
|
|
|
|
* Initiative's aims are:
|
|
|
- short-term: provide **strong baseline** for pre-training and **transfer learning** for COVID X-Ray diagnostics using large-scale datasets of images from different domains (using both generic datasets like ImageNet and medical imaging datasets like CheXPert, COVIDx, etc - see [Description of available models, codes and datasets](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_public/sharedspace/playground/covid_xray_deeplearning/wiki/-/blob/master/Description.md))
|
|
|
- **indicate** for the public users (medical doctors, etc) how **certain / uncertain** the performed classification is, given images provided by the users
|
|
|
- **indicate** for the users how strongly the provided images are **out-of-distribution**, signaling whether pre-trained model is likely not able to produce useful diagnostics on-fly for the given images and potential need for further re-calibration / fine-tuning on the new images before attempting diagnostics
|
|
|
- **indicate** for the users on which basis the classification was made. e.g by highlighting regions of the input X-Ray image by a heat map or visualizing receptive fields of responsible activations across layers showing which **image regions** or **intermediate features** are **essential for diagnostics decision**
|
|
|
- **indicate** for the users on which basis the classification was made. e.g by highlighting regions of the input X-Ray image by a heat map or visualizing receptive fields of responsible activations across layers showing which **image regions** or **intermediate features** are **essential for diagnostics decision**
|
|
|
- Mid-term: establish large-scale pre-training - transfer procedure that generates large models (>100M parameters, eg EfficientNet-B7, NFNet-F4 and larger) pre-trained on large generic datasets (eg ImageNet-21k and larger) that can be efficiently (**few or zero-shot**) transferred to various downstream medical imaging datasets and tasks **beyond** special case of COVID detection.
|
|
|
- **Long-term vision** is a generic system digesting **different types of image modalities** (not only X-Ray - eg. CT and 3D CT scans, including eventually entirely different modalities like ultrasonography, etc), **continually improving generic model** of image understanding (with strong focus on medical diagnostics and analysis of pathological signatures in this frame), allowing **fast transfer** to a specified domain of interest. So, if a new domain X comes up, triggered by an unknown novel pathogen Y causing a disease Z that can be diagnosed via medical imaging, the generic model, **pre-trained on millions of different images from distinct domains**, can be used to derive quickly an expert model for domain X. This should enable quick reaction in face of novel, yet unknown pathologies, where availability of diagnostics is initially impaired.
|
|
|
* For collaborators: [Helmholtz AI COVIDNetX Initiative internal](https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_internal/superhaicu/shared_space/playground/covid19/-/wikis/home)
|
|
|
|
|
|
#### Directions and topics for collaborations
|
|
|
Following directions are currently envisaged, please feel free to add more:
|
|
|
|
|
|
* Large-Scale Pre-Training and Cross-Domain Transfer
|
|
|
- Auxiliary tasks and losses
|
|
|
- Unsupervised, self-supervised pre-training for transfer, e.g. via contrastive losses
|
|
|
- Generative models for unsupervised pre-training
|
|
|
- Generative models for unsupervised pre-training and active data augmentation during pre-training and transfer
|
|
|
* Uncertainty estimation and signaling
|
|
|
* Methods for validation of diagnostics and explainable output
|
|
|
* Learning from high resolution images, multi-scale architectures (> 512x512)
|
... | ... | |