Commit 204e8eaf authored by jenia jitsev's avatar jenia jitsev
Browse files

JJ: unsupervised pretraining and transfer paper update

parent 58934348
......@@ -198,6 +198,38 @@
* Source code:
##### Unsupervised pretraining: COVID-19 Deterioration Prediction via Self-Supervised Representation Learning and Multi-Image Prediction
- paper:
Anuroop Sriram, Matthew Muckley, Koustuv Sinha, Farah Shamout, Joelle Pineau, Krzysztof J. Geras, Lea Azour, Yindalon Aphinyanaphongs, Nafissa Yakubova, William Moore
The rapid spread of COVID-19 cases in recent months has strained hospital resources, making rapid and accurate triage of patients presenting to emergency departments a necessity. Machine learning techniques using clinical data such as chest X-rays have been used to predict which patients are most at risk of deterioration. We consider the task of predicting two types of patient deterioration based on chest X-rays: adverse event deterioration (i.e., transfer to the intensive care unit, intubation, or mortality) and increased oxygen requirements beyond 6 L per day. Due to the relative scarcity of COVID-19 patient data, existing solutions leverage supervised pretraining on related non-COVID images, but this is limited by the differences between the pretraining data and the target COVID-19 patient data. In this paper, we use self-supervised learning based on the momentum contrast (MoCo) method in the pretraining phase to learn more general image representations to use for downstream tasks. We present three results. The first is deterioration prediction from a single image, where our model achieves an area under receiver operating characteristic curve (AUC) of 0.742 for predicting an adverse event within 96 hours (compared to 0.703 with supervised pretraining) and an AUC of 0.765 for predicting oxygen requirements greater than 6 L a day at 24 hours (compared to 0.749 with supervised pretraining). We then propose a new transformer-based architecture that can process sequences of multiple images for prediction and show that this model can achieve an improved AUC of 0.786 for predicting an adverse event at 96 hours and an AUC of 0.848 for predicting mortalities at 96 hours. A small pilot clinical study suggested that the prediction accuracy of our model is comparable to that of experienced radiologists analyzing the same information.
* Code:
- Unsupervised pre-training via contrastive loss and transfer
- NYU Medical School / Facebook Research Lab
- Demonstration of use scenario in clinical setting with conventional X-Ray images, predicting patient state in proximal time (patient status prediction 24h, 48h, 72h from conventional X-Ray)
- "In this paper, we study the applicability of self-supervised learning to the task of COVID-19 deterioration prediction. We pretrain a model using momentum contrast (MoCo) [22, 23] on two large, public chest X-ray datasets, MIMIC-CXR-JPG [18, 24, 25] and CheXpert [16], and then use the pretrained model as a feature extractor for the downstream task of prediction COVID-19 patient outcomes."
- "One common approach to dealing with small amounts of labeled data is to apply transfer learning [19], where a model is pretrained in a supervised fashion on a large, labeled dataset (e.g., ImageNet), then finetuned on the task of interest."
"However, transfer learning can lead to poor performance if the tasks are too different, and the model is not able to learn the features necessary for the transfer task during the pretraining step."
"Recently, new self-supervised methods which rely on contrastive losses have been shown to generate representations that are as good for classification as those generated using purely supervised methods [20, 21, 22, 23]. The advantage of contrastive loss functions is that they are able to achieve feature extraction independent of labels or tasks associated with the pretraining dataset. These features are then used for training a classifier in the fine-tuning stage with the target data."
- "We applied both supervised and self-supervised pretraining procedures. Our supervised pretraining was similar to that for CheXpert [16]. For supervised pretraining, we trained the DenseNet-121 model with the Adam optimizer [29] with a learning rate of 10−3, weight decay of 10−5, and batch size of 64 on the MIMIC-JPG dataset [24]. Data augmentation included interpolation to a 224×224 grid and random vertical/horizontal flipping. We pretrained for 10 epochs, decaying the learning rate by a factor of 10 each epoch."
"We pretrained our self-supervised model using the momentum contrast training described in Section 3.1. For data augmentation, we used random cropping and interpolation to a 224×224 grid, random horizontal/vertical flipping, and random Gaussian noise addition. We investigated a number of augmentation strategies for pretraining the MoCo models, including noise additions, affine transformations, color transformations, and X-ray acquisition simulation, but in the end we found that these provided little benefit beyond those in the original MoCo paper [22]. After the augmentations, we applied histogram normalization. The histogram normalization procedure preserves the relative values of the pixels by interpolating along a shifted version of the image pixel value histogram while constraining the resulting pixel values to be in the specified target range. The augmentations are described in further detail in Appendix A."
"We tuned the following hyperparameters during the pretraining phase: learning rate, MoCo latent feature dimension size, and the queue size. We searched over a logarithmic scale of values, varying the learning rate within 10 {−2,−1,0},and MoCo feature dimensions within {64,128,256}. The queue size was fixed at 65,536. We used a batch size of 128 for each of 8 GPUs, the largest we could achieve in initial testing, accumulating gradients using PyTorch’s DistributedDataParallel framework [32,22]. We selected hyperparameters based a cross-validation analysis on the downstream tasks. We optimized models using stochastic gradient descent with momentum [33], using 0.9 as the momentum term and a weight decay parameter of 10−4. Pretraining took approximately four days using eight 16 GB Nvidia V100 GPUs."
- pre-trained models (unsupervised, contrastive losses):
### Other modalities (ultrasound, sound recordings, etc)
#### Ultrasound (ultrasonic imaging, ultasonography)
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment