Initial Commit

df1b70c9 · Christian Faber · df1b70c9 · df1b70c9 · df1b70c9 · df1b70c9
Commit df1b70c9 authored Feb 19, 2024 by Christian Faber
--- a/.gitignore
+++ b/.gitignore
+data/*
+!data/readme.txt
\ No newline at end of file
--- a/README.md
+++ b/README.md
+# Talk Data Visualisation with Plotly
+Scientists have always been the experts for data. Analysing and drawing conclusions from them is our daily business, and the amount of data that scientists are confronted with is growing rapidly as time passes and computing resources increase. The challenge is to quickly deal with individual data structures for which there is usually no off-the-shelf solution.
+In this talk, I will tell you how you can create visualisations tailored to your data. I will show, how you can access your data interactively and thus gain maximum insight from it. The graphics are created using python and the pyplot and dash libraries to achieve maximum customisability. The entire process is demonstrated using a sequence mutation example from biophysics, but the methods can be applied to any field involving large amounts of data.
+## Getting Started
+There are in principal two ways to try all the examples on your own machine:
+1. Use a Docker Container (**Recommended**) or 
+2. Clone this Git Repo and install all requirements.
+### Docker
+You need a running Docker instance with docker compose. Download the two files [`Dockerfile`](https://fz-juelich.sciebo.de/s/8pJMu1c6Rdov12r/download) and [`compose.yaml`](https://fz-juelich.sciebo.de/s/ChrtUEpvIOGFOBA/download) on your local machine. Browse to the their destination and run
+```bash
+$ docker compose up
+```
+Now you can open your browser, type in http://0.0.0.0:8080. Everything should set up and with the run button you can execute the examples and open the output in your browser under http://0.0.0.0:8051.
+### Manually
+Requirements: Python (>=3.10), pip, Code Editor, Browser
+> **Note:** It is recommended to use a virtual environment, but you can also decide to go without.
+1. Create a new environment
+```bash
+$ # Create a new project folder
+$ mkdir talk_vis_python
+$ cd talk_vis_python python
+$ # Create virtual environment 
+$ python3.11 -m venv "vis_python"
+$ # Activate the virtual environment
+$ source vis_python/bin/activate 
+```
+2. Clone Git Rep
+```bash
+$ mkdir git-talk-vis
+$ git clone https://jugit.fz-juelich.de/c.faber/talk-data-visualisation-with-plotly.git git-talk-vis
+```
+3. Install all libraries
+```bash
+$ pip install -r git-talk-vis/requirements.txt
+```
+4. Run the examples
+Use the IDE of your choice and run the examples in `git-talk-vis` or use 
+```bash
+$ python git-talk-vis/<file>.py
+```
+and open the browser http://0.0.0.0:8051.
+> **Note:** In some browsers you have to replace `0.0.0.0` with `localhost`.
+## Examples
+There are in total three examples to explore.
+1) First visualisaton, using pandas & plotly
+2) Interactive Plots, introduction to callbacks
+3) Plot Inference, creation of graphs including the interaction with other graphs
+A minimal working example for `Dash` can be found under `Ex0`.
+## Data
+Monthly mean air temperature for Germany. From 1881 to 2023 with with subdivision into the individual federal states.
+**Source:** Deutscher Wetterdienst ([download](https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/))
+## Contact
+If you have further questions or ideas do not hesitate to write me a [mail](mailto:c.faber@fz-juelich.de).
--- a/data/readme.txt
+++ b/data/readme.txt
+-------------------------------------------------------
+Climate Data - Ex 1 & Ex 2
+-------------------------------------------------------
+Source: Deutscher Wetterdienst
+https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/
--- a/docker_coder_config.yaml
+++ b/docker_coder_config.yaml
+bind-addr: 0.0.0.0
+port: 8080
+auth: none
+cert: false
+trusted-origins: false
+app-name: "Data Visualisation with Plotly"
\ No newline at end of file
--- a/docker_start.sh
+++ b/docker_start.sh
+alias python=python3
+mkdir -p /root/.local/share/code-server/User
+jq -n '{"remote.autoForwardPorts": false, "git.openRepositoryInParentFolders": "never", "python.defaultInterpreterPath":"/usr/local/bin"}' > /root/.local/share/code-server/User/settings.json
+code-server --config="/app/git-talk-vis/docker_coder_config.yaml" --disable-getting-started-override --disable-proxy /app/git-talk-vis
\ No newline at end of file
--- a/examples/Ex0_Test_Page/main.py
+++ b/examples/Ex0_Test_Page/main.py
+from dash import Dash, html
+import dash, plotly
+app = Dash(__name__)
+server = app.server
+app.layout = html.Div(
+    children=[
+        html.H1("Example 0: Minimal Working Example"),
+        html.Span(
+            f"You are using Dash {dash.__version__} and Plotly {plotly.__version__}."
+        ),
+    ]
+)
+if __name__ == "__main__":
+    app.run(debug=True, host="0.0.0.0", port="8051")
--- a/examples/Ex1_First_Visualisation/main.py
+++ b/examples/Ex1_First_Visualisation/main.py
+# Dash
+from dash import Dash, html, dcc
+# Plotting
+import plotly.express as px
+import plotly.graph_objects as go
+# Importing Data
+import pandas as pd
+from urllib.request import urlretrieve
+import os
+# Download Data if necessary
+for i in range(1, 13):
+    if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
+        urlretrieve(
+            f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
+            f"data/regional_averages_tm_{i:02}.txt",
+        )
+# Save data in a single pandas Dataframe
+data = [
+    pd.read_csv(
+        f"data/regional_averages_tm_{i:02}.txt",
+        sep=";",
+        header=1,
+        usecols=[j for j in range(0, 19)],
+    )
+    for i in range(1, 13)
+]
+df = pd.concat(data)
+df["date"] = pd.to_datetime(dict(year=df.Jahr, month=df.Monat, day=1))
+# Creating Figures
+fig1 = px.line(
+    df.sort_values(by="date"),
+    x="date",
+    y="Deutschland",
+    hover_data=["Nordrhein-Westfalen", "Brandenburg/Berlin"],
+)
+fig2 = go.Figure(
+    data=px.scatter(
+        df.groupby(["Jahr"], as_index=False).mean(),
+        x="Jahr",
+        y="Deutschland",
+        hover_data={"Nordrhein-Westfalen": ":.2f", "Brandenburg/Berlin": True},
+        trendline="ols",
+    ).data
+    + px.line(
+        df.groupby(["Jahr"], as_index=False).mean(),
+        x="Jahr",
+        y="Deutschland",
+    ).data
+)
+# Creating Webpage/Dashboard
+app = Dash(__name__)
+app.layout = html.Div(
+    children=[
+        html.H1("Example 1: First Visualisation"),
+        dcc.Graph(figure=fig1),
+        dcc.Graph(figure=fig2),
+    ]
+)
+# Start Server
+if __name__ == "__main__":
+    app.run(debug=True, host="0.0.0.0", port="8051")
--- a/examples/Ex2_Interactive_Plots/main.py
+++ b/examples/Ex2_Interactive_Plots/main.py
+# Dash
+from dash import Dash, html, dcc, Input, Output, callback
+# Plotting
+import plotly.express as px
+import plotly.graph_objects as go
+# Importing Data
+import pandas as pd
+from urllib.request import urlretrieve
+import os
+# Download Data if necessary
+for i in range(1, 13):
+    if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
+        urlretrieve(
+            f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
+            f"data/regional_averages_tm_{i:02}.txt",
+        )
+# Save data in a single pandas Dataframe
+data = [
+    pd.read_csv(
+        f"data/regional_averages_tm_{i:02}.txt",
+        sep=";",
+        header=1,
+        usecols=[j for j in range(0, 19)],
+    )
+    for i in range(1, 13)
+]
+df = pd.concat(data)
+df["date"] = pd.to_datetime(dict(year=df.Jahr, month=df.Monat, day=1))
+# Creating Webpage/Dashboard
+app = Dash(__name__)
+app.layout = html.Div(
+    children=[
+        html.H1("Example 2: Interactive Plots"),
+        dcc.Dropdown(
+            id="dropdown_state", options=df.columns[2:18], value="Brandenburg/Berlin"
+        ),
+        dcc.Graph(id="fig_mean_T"),
+    ]
+)
+@callback(Output("fig_mean_T", "figure"), Input("dropdown_state", "value"))
+def update_figure(selected_column):
+    fig1 = px.line(
+        df.groupby(["Jahr"], as_index=False).mean(),
+        x="Jahr",
+        y=selected_column,
+    )
+    fig2 = px.line(
+        df.groupby(["Jahr"], as_index=False).mean(),
+        x="Jahr",
+        y="Deutschland",
+    )
+    fig2.update_traces(line_color="red")
+    return go.Figure(data=fig1.data + fig2.data)
+if __name__ == "__main__":
+    app.run(debug=True, host="0.0.0.0", port="8051")
--- a/examples/Ex3_Plot_Inference/main.py
+++ b/examples/Ex3_Plot_Inference/main.py
+# Dash
+from dash import Dash, html, dcc, Input, Output, callback
+# Plotting
+import plotly.express as px
+import plotly.graph_objects as go
+# Importing Data
+import pandas as pd
+from urllib.request import urlretrieve
+import os
+# Download Data if necessary
+for i in range(1, 13):
+    if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
+        urlretrieve(
+            f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
+            f"data/regional_averages_tm_{i:02}.txt",
+        )
+# Save data in a single pandas Dataframe
+data = [
+    pd.read_csv(
+        f"data/regional_averages_tm_{i:02}.txt",
+        sep=";",
+        header=1,
+        usecols=[j for j in range(0, 19)],
+    )
+    for i in range(1, 13)
+]
+df = pd.concat(data)
+# Calculation of the reference temperature
+period: list[int] = [1991, 2020]
+df2 = (
+    df[(df["Jahr"] >= period[0]) & (df["Jahr"] <= period[1])]
+    .groupby(["Monat"], as_index=False)
+    .mean()
+)
+# Melting the dataframe and adding reference temperature
+df = df.melt(
+    id_vars=["Jahr", "Monat", "Deutschland"], var_name="Bundesland", value_name="T"
+)
+df["reference_T"] = [
+    df2[df2["Monat"] == x["Monat"]][x["Bundesland"]].iloc[0] for x in df.iloc
+]
+# Scatter Plot
+fig1_sub1 = px.scatter(
+    df, x="reference_T", y="T", hover_data=["Bundesland", "Monat", "Jahr"]
+)
+fig1_sub2 = px.line(x=[-2, 20], y=[-2, 20])
+fig1 = go.Figure(data=fig1_sub1.data + fig1_sub2.data)
+fig1.update_layout(
+    yaxis_title="Temperature [°C]", xaxis_title="Reference Temperature [°C]"
+)
+app = Dash(__name__)
+app.layout = html.Div(
+    children=[
+        html.H1("Example 3: Plot Inference"),
+        dcc.Graph(id="scatter", figure=fig1),
+        dcc.Graph(id="histo"),
+    ]
+)
+@callback(Output("histo", "figure"), Input("scatter", "selectedData"))
+def update_histo(selection: dict) -> go.Figure:
+    if selection is not None:
+        sel_df = df.iloc[[p["pointIndex"] for p in selection["points"]]]
+        fig2 = px.histogram(sel_df, "Jahr")
+        return fig2
+    else:
+        return go.Figure()
+if __name__ == "__main__":
+    app.run(debug=True, host="0.0.0.0", port="8051")
--- a/requirements.txt
+++ b/requirements.txt
+Dash
+pandas
+statsmodels
\ No newline at end of file