template challenge first commit

e393b634 · Mehdi Cherti · e393b634 · e393b634 · e393b634 · e393b634
Commit e393b634 authored Oct 16, 2020 by Mehdi Cherti
--- a/README.md
+++ b/README.md
+# Template Challenge
+
+This is a template challenge. If you would like to create a new challenge
+at Jülich Callenges, please follow the steps. All the steps are necessary.
+
+## 1) Clone the repository
+
+- `git clone https://gitlab.version.fz-juelich.de/MLDL_FZJ/juhaicu/jsc_internal/superhaicu/vouchers/j-lich-challenges/challenges/template_challenge my_challenge`
+- `cd my_challenge`
+
+## 2) Download Raw data
+
+You first need to download the raw data if needed, it does not matter where you put as long as it is acccessible, for instance
+yon can put it in the folder `raw/` in `my_challenge`.
+
+## 3) Build annotation files
+
+You need to create annotation splits.
+Please check `annotations/generate.py` for more details.
+Basically, the script `annotations/generate.py` should be edited
+so that it creates annotation files (train.csv, valid.csv, test.csv, submission_valid.csv, submission_test.csv)
+based on raw data. The splits can be generated randomly or follow the splits you already
+have in your raw data. 
+
+Note that The CSVs should not be heavy, it should not contain images themselves
+if your challenge have some. If your challenge have images, you can have a column 
+pointing to filenames of the images.
+
+Once you finish editing `annotations/generate.py`, please run it:
+
+`cd annotations;python generate.py`
+
+you will have now the following files in the folder `annotations/`:
+
+- `annotations/train.csv`
+- `annotations/valid.csv`
+- `annotations/test.csv`
+- `annotations/submission_valid.csv`
+- `annotations/submission_test.csv`
+
+## 4) Edit the evaluation script
+
+This is one of the most important steps because in this step
+you define the metrics that will be used to rank the participants
+in the leaderboard.
+
+Please edit `evaluation_script/main.py` for your challenge.
+It comes with an example where the accuracy metric is used
+for a classification problem. But you can use any set of metrics.
+It is a regular python file, so any metric which can be written
+in Python could be used. You can use `Scikit-Learn` which already
+offers most of the usual metrics.
+Basically, the script `evaluation_script/main.py` returns a Python dictionary
+containing the metric names as keys and the metric values as values.
+You can use as many metrics as you want, the metric that will be used
+to rank the participants in the leaderboard will be chosen later in `challenge_config.yaml`.
+
+Here is just one note about how `evaluation_script/main.py` works.
+In order to compute the metrics, the participant submission and the annotation files that you generated
+previously are opened (e.g., valid.csv or test.csv depending on the challenge phase)
+in `evaluation_script/main.py`. The two (participant submission and annotation file) are then compared using the column
+that corresponds to the label (e.g., the column `label` in this template challenge).
+
+## 5) Edit challenge_config.yaml
+
+Please dit `challenge_config.yaml` and change everything which is specific to your challenge.
+More documentation about the fields is given at <https://evalai.readthedocs.io/en/latest/configuration.html>
+
+The most important parts to modify are the fields in the top of the file, they contain basic information about the challenge.
+Plase also chalenge the leaderboard section. In the leaderboard section, please put all the metric names you want to display in the leaderboard.
+The metric names correspond to the metric names you defined in `evaluation_script/main.py`.
+`default_order_by` defines the metric you use to rank the participants.
+
+Please check carefully all the fields so that it correspond to your needs for the challenge,
+such as the starting date and ending date of the different phases and the number of submissions.
+
+## 6) Edit the template files and the logo
+
+Please edit the HTML files in the folder `templates/`. These HTML files
+are displayed in the frontend in your challenge page.
+Please also modify the logo file `logo.jpg` to have a custom logo for your challenge,
+this will displayed in the frontend as well.
+
+## 7) Edit notebook.ipynb
+
+Please edit `notebook.ipynb`. This part is important for the participants. This is given to the participants so that they can
+have the challlenge introduced to them and be able to easily understand the problem and start to submit easily. 
+Steps such as downloading the data, explanation of the problem, exploratory data analysis, baseline solutions, and example submissions 
+should be included.
+
+## 8) Edit run.sh
+
+Please edit `run.sh` if needed. This script will make all the archives needed for the challenge. More details are given in `run.sh`.
+Here is a short description of the files that `run.sh` need to create:
+
+- `challenge_config.zip`. This archive contains the challenge configuration, this file will be uploaded at the frontend by yourself
+to create the challenge
+- `data_public_leaderboard_phase.zip`. This archive is given to the users for the public leaderboard phase, so you need to upload
+it somehwere in a public link.
+- `data_private_leaderboard_phase.zip`. This archive is given to the users for the private leaderboard phase, so you need to upload
+it somehwere in a public link.
+
+`run.sh` should work as is, but if you have any a additional files to included in the data archives (such as images), you need to change it.
+When `run.sh` is ready, please run it:
+
+`./run.sh`
+
+After running the script, you should have these files:
+
+- `challenge_config.zip`
+- `data_public_leaderboard_phase.zip`
+- `data_private_leaderboard_phase.zip`
+
+Please then upload the data archives in a public link, and reference the links
+in the HTML files in `templates/` as well as in the `notebook.ipynb`
+
+## 9) Upload `challenge_config.zip` to the frontend
+
+- Go to the frontend
+- Create a new challenge, then upload `challenge_config.zip`.
+
+This is the last step. One of the administrators should then accept your challenge so that
+it becomes public.
+
+Thanks for following this guide, your challenge should be ready now!
--- a/annotations/generate.py
+++ b/annotations/generate.py
+import uuid
+import random
+import pandas as pd
+# Please modify this script
+# This script should take raw data you have
+# and generate the data splits in a CSV file:
+# train.csv, valid.csv, test.csv
+# - train.csv would be given to the participants
+# - valid.csv is the public leaderboard data with the **labels**, so it will not be given to the participants.
+# - submission_valid.csv is the same as valid.csv but the labels column contains some dummy values,
+#   this will be given to the participants as a dummy submission for the public leaderboard phase
+# - test.csv is the private leaderboard data with the **labels"", so it will not be given to the participants.
+# - submission_test.csv is the same as test.csv, the difference is that the labels column contains dummy values,
+#    this will be given to the participants as a dummy subbmission for the private leaderboard phase
+
+# WARNING: in the following script we just generate the CSVs randomly, please edit this script
+
+def generate_dummy(nb):
+    names = [uuid.uuid4() for _ in range(nb)]
+    labels = [random.choice(("label1", "label2", "label3")) for _ in range(nb)]
+    df = pd.DataFrame({"name": names, "label": labels})
+    return df
+
+
+nb_train = 10000
+nb_valid = 1000
+nb_test = 1000
+
+
+train = generate_dummy(nb_train)
+valid = generate_dummy(nb_valid)
+test = generate_dummy(nb_test)
+
+train.to_csv("train.csv", index=False)
+valid.to_csv("valid.csv", index=False)
+test.to_csv("test.csv", index=False)
+
+valid["label"] = ""
+valid.to_csv("submission_valid.csv", index=False)
+
+test["label"] = ""
+test.to_csv("submission_test.csv", index=False)
--- a/challenge_config.yaml
+++ b/challenge_config.yaml
+# If you are not sure what all these fields mean, please refer our documentation here:
+# https://evalai.readthedocs.io/en/latest/configuration.html
+title: Template Challenge
+short_description: Predict XXX from XXX
+description: templates/description.html
+evaluation_details: templates/evaluation_details.html
+terms_and_conditions: templates/terms_and_conditions.html
+image: logo.jpg
+submission_guidelines: templates/submission_guidelines.html
+leaderboard_description: We use the XXX metric for evaluation 
+evaluation_script: evaluation_script.zip
+remote_evaluation: False
+is_docker_based: False
+start_date: 2019-01-01 00:00:00
+end_date: 2099-05-31 23:59:59
+published: True
+
+leaderboard:
+  - id: 1
+    schema:
+      {
+        "labels": ["accuracy"],
+        "default_order_by": "accuracy",
+      }
+
+challenge_phases:
+  - id: 1
+    name: Dev Phase
+    description: templates/challenge_phase_1_description.html
+    leaderboard_public: True
+    is_public: True
+    is_submission_public: True
+    start_date: 2019-01-19 00:00:00
+    end_date: 2099-04-25 23:59:59
+    test_annotation_file: annotations/valid.csv
+    codename: dev
+    max_submissions_per_day: 100
+    max_submissions_per_month: 50
+    max_submissions: 50
+    submission_meta_attributes: null
+    is_restricted_to_select_one_submission: False
+    is_partial_submission_evaluation_enabled: False
+  - id: 2
+    name: Test Phase
+    description: templates/challenge_phase_2_description.html
+    leaderboard_public: False
+    is_public: False
+    is_submission_public: False
+    start_date: 2019-01-19 00:00:00
+    end_date: 2099-04-25 23:59:59
+    test_annotation_file: annotations/test.csv
+    codename: test
+    max_submissions_per_day: 100
+    max_submissions_per_month: 50
+    max_submissions: 50
+    submission_meta_attributes: null
+    is_restricted_to_select_one_submission: True
+    is_partial_submission_evaluation_enabled: False
+
+dataset_splits:
+  - id: 1
+    name: Valid Split
+    codename: dev
+  - id: 2
+    name: Test Split
+    codename: test
+
+challenge_phase_splits:
+  - challenge_phase_id: 1
+    leaderboard_id: 1
+    dataset_split_id: 1
+    visibility: 1
+    leaderboard_decimal_precision: 3
+  - challenge_phase_id: 2
+    leaderboard_id: 1
+    dataset_split_id: 2
+    visibility: 1
+    leaderboard_decimal_precision: 3
+    is_leaderboard_order_descending: true
--- a/logo.jpg
+++ b/logo.jpg
--- a/notebook.ipynb
+++ b/notebook.ipynb
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "0yAIJhY1M41M"
+   },
+   "source": [
+    "# The XXX challenge"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please provide a logo"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "NQyQqzQTM41O"
+   },
+   "source": [
+    "<img src=\"\" alt=\"logo\" width=\"400\"/>"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please give a short description of your challenge"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Requirements"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 67,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install pandas torch torchvision scikit-learn # please put requirements here"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "6bejqMsnM41a"
+   },
+   "source": [
+    "## Donwloading the data\n",
+    "\n",
+    "Please describe the steps to download the data:"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The full data is around 5GB. In order to start quickly we also provide a subset of the training data here:"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {
+    "scrolled": true
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "--2020-10-02 08:42:27--  http://xxxx/\r\n",
+      "Resolving xxxx (xxxx)... failed: Temporary failure in name resolution.\r\n",
+      "wget: unable to resolve host address ‘xxxx’\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!wget XXXX"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {
+    "colab": {
+     "base_uri": "https://localhost:8080/",
+     "height": 34
+    },
+    "id": "rLuX23mznCg5",
+    "outputId": "d4903e59-bcac-46af-8c39-20ff324f551f"
+   },
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "README.md    challenge_config.yaml  logo.jpg\t    run.sh  templates\r\n",
+      "annotations  evaluation_script\t    notebook.ipynb  tags\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!ls"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "QIeCjvyTM418"
+   },
+   "source": [
+    "# Exploratory Data Analysis\n",
+    "\n",
+    "Please provide different cells that explore a little bit\n",
+    "the data with some nice visualizations.\n",
+    "\n",
+    "Possibilities:\n",
+    "\n",
+    "- Class Frequencies\n",
+    "- Plots\n",
+    "- Show different examples\n",
+    "- Summary Statistics\n",
+    "- etc."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "ffcOqQaepyRE"
+   },
+   "source": [
+    "# Dummy Submission"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please provide a dummy submission.\n",
+    "Here a CSV submission example should be generated, no training would be involved,\n",
+    "it can be constant classifier model for instance (sklearn's DummyClassifier for instance).\n",
+    "It's the simplest example for the participants to start with."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the following cells should generate submission.csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "head: cannot open 'submission.csv' for reading: No such file or directory\r\n"
+     ]
+    }
+   ],
+   "source": [
+    "!head submission.csv"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Now, you can open the submision.csv file (File -> Open) file and download it!\n",
+    "\n",
+    "After you download it, you can upload it to the frontend, here: XXX (direct link to the challenge)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Baseline simple solution"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Please provide a simple baselne submission.\n",
+    "Here a CSV submission example should be generated,  training\n",
+    "will be involved but it should be very fast (matter of seconds),\n",
+    "and the solution should be a good starting point"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 6,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "# the following cells should generate submission.csv"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {
+    "id": "zKcxwPiyM42x"
+   },
+   "source": [
+    "Now, you can open the submision.csv file (File -> Open) file and download it!\n",
+    "\n",
+    "After you download it, you can upload it to the frontend, here: XXX (direct link to the challenge)"
+   ]
+  }
+ ],
+ "metadata": {
+  "accelerator": "GPU",
+  "colab": {
+   "collapsed_sections": [],
+   "name": "notebook.ipynb",
+   "provenance": []
+  },
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.6.9"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
+%% Cell type:markdown id: tags:
+
+# The XXX challenge
+
+%% Cell type:markdown id: tags:
+
+Please provide a logo
+
+%% Cell type:markdown id: tags:
+
+<img src="" alt="logo" width="400"/>
+
+%% Cell type:markdown id: tags:
+
+Please give a short description of your challenge
+
+%% Cell type:markdown id: tags:
+
+# Requirements
+
+%% Cell type:code id: tags:
+
+``` python
+!pip install pandas torch torchvision scikit-learn # please put requirements here
+```
+
+%% Cell type:markdown id: tags:
+
+## Donwloading the data
+
+Please describe the steps to download the data:
+
+%% Cell type:markdown id: tags:
+
+The full data is around 5GB. In order to start quickly we also provide a subset of the training data here:
+
+%% Cell type:code id: tags:
+
+``` python
+!wget XXXX
+```
+
+%% Output
+
+    --2020-10-02 08:42:27--  http://xxxx/
+    Resolving xxxx (xxxx)... failed: Temporary failure in name resolution.
+    wget: unable to resolve host address ‘xxxx’
+
+%% Cell type:code id: tags:
+
+``` python
+!ls
+```
+
+%% Output
+
+    README.md    challenge_config.yaml  logo.jpg	    run.sh  templates
+    annotations  evaluation_script	    notebook.ipynb  tags
+
+%% Cell type:markdown id: tags:
+
+# Exploratory Data Analysis
+
+Please provide different cells that explore a little bit
+the data with some nice visualizations.
+
+Possibilities:
+
+- Class Frequencies
+- Plots
+- Show different examples
+- Summary Statistics
+- etc.
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+# Dummy Submission
+
+%% Cell type:markdown id: tags:
+
+Please provide a dummy submission.
+Here a CSV submission example should be generated, no training would be involved,
+it can be constant classifier model for instance (sklearn's DummyClassifier for instance).
+It's the simplest example for the participants to start with.
+
+%% Cell type:code id: tags:
+
+``` python
+# the following cells should generate submission.csv
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+
+%% Cell type:code id: tags:
+
+``` python
+!head submission.csv
+```
+
+%% Output
+
+    head: cannot open 'submission.csv' for reading: No such file or directory
+
+%% Cell type:markdown id: tags:
+
+Now, you can open the submision.csv file (File -> Open) file and download it!
+
+After you download it, you can upload it to the frontend, here: XXX (direct link to the challenge)
+
+%% Cell type:markdown id: tags:
+
+# Baseline simple solution
+
+%% Cell type:markdown id: tags:
+
+Please provide a simple baselne submission.
+Here a CSV submission example should be generated,  training
+will be involved but it should be very fast (matter of seconds),
+and the solution should be a good starting point
+
+%% Cell type:code id: tags:
+
+``` python
+# the following cells should generate submission.csv
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:code id: tags:
+
+``` python
+```
+
+%% Cell type:markdown id: tags:
+
+Now, you can open the submision.csv file (File -> Open) file and download it!
+
+After you download it, you can upload it to the frontend, here: XXX (direct link to the challenge)
--- a/run.sh
+++ b/run.sh
+#!/bin/bash
+
+# Please edit this script
+# it should create the following archives:
+# 1) evaluation_script.zip (already done, no need to modify)
+# 2) challenge_config.zip (arleady done, no need to modify), this is the file you upload to the challenge frontend to create a new challenge
+# 3) data_public_leaderboard_phase.zip  (archive to download for the users for the public leaderboard phase)
+# 4) data_private_leaderboard_phase.zip (archive to download for the users for the private leaderboard phase)
+
+# Remove already existing zip files
+rm -f *.zip *.tar.gz
+
+# Create new zip configuration according the updated code
+zip -r -j evaluation_script.zip evaluation_script/*  -x "*.DS_Store"
+zip -r challenge_config.zip *  -x "*.DS_Store" -x "evaluation_script/*" -x "*.git" -x "run.sh" -x "raw/*" -x "annotations/train/*" -x "annotations/valid/*" -x "annotations/test/*"
+
+cd annotations
+zip -r -j ../data_public_leaderboard_phase.zip train.csv train/ valid/ submission_valid.csv README.md
+zip -r -j ../data_private_leaderboard_phase.zip test test.csv submission_test.csv README.md
+cd ..
--- a/templates/challenge_phase_1_description.html
+++ b/templates/challenge_phase_1_description.html
+Please provide a description of the public leaderboard phase
--- a/templates/challenge_phase_2_description.html
+++ b/templates/challenge_phase_2_description.html
+Please provide a description of the private leaderboard phase
--- a/templates/description.html
+++ b/templates/description.html
+Pleaes provide:
+
+- a description of the challenge
+- link to the notebook 
--- a/templates/evaluation_details.html
+++ b/templates/evaluation_details.html
+Please provide the details of the evaluation (metrics, etc)
--- a/templates/submission_guidelines.html
+++ b/templates/submission_guidelines.html
+Please provide guidelines for how to submit (e.g., how the submission file should be formatted)
--- a/templates/terms_and_conditions.html
+++ b/templates/terms_and_conditions.html
+Please provide terms and conditions to partcipate in the challenge