Skip to content
Snippets Groups Projects
Commit df1b70c9 authored by Christian Faber's avatar Christian Faber
Browse files

Initial Commit

parents
Branches main
No related tags found
No related merge requests found
data/*
!data/readme.txt
\ No newline at end of file
# Talk Data Visualisation with Plotly
Scientists have always been the experts for data. Analysing and drawing conclusions from them is our daily business, and the amount of data that scientists are confronted with is growing rapidly as time passes and computing resources increase. The challenge is to quickly deal with individual data structures for which there is usually no off-the-shelf solution.
In this talk, I will tell you how you can create visualisations tailored to your data. I will show, how you can access your data interactively and thus gain maximum insight from it. The graphics are created using python and the pyplot and dash libraries to achieve maximum customisability. The entire process is demonstrated using a sequence mutation example from biophysics, but the methods can be applied to any field involving large amounts of data.
## Getting Started
There are in principal two ways to try all the examples on your own machine:
1. Use a Docker Container (**Recommended**) or
2. Clone this Git Repo and install all requirements.
### Docker
You need a running Docker instance with docker compose. Download the two files [`Dockerfile`](https://fz-juelich.sciebo.de/s/8pJMu1c6Rdov12r/download) and [`compose.yaml`](https://fz-juelich.sciebo.de/s/ChrtUEpvIOGFOBA/download) on your local machine. Browse to the their destination and run
```bash
$ docker compose up
```
Now you can open your browser, type in http://0.0.0.0:8080. Everything should set up and with the run button you can execute the examples and open the output in your browser under http://0.0.0.0:8051.
### Manually
Requirements: Python (>=3.10), pip, Code Editor, Browser
> **Note:** It is recommended to use a virtual environment, but you can also decide to go without.
1. Create a new environment
```bash
$ # Create a new project folder
$ mkdir talk_vis_python
$ cd talk_vis_python python
$ # Create virtual environment
$ python3.11 -m venv "vis_python"
$ # Activate the virtual environment
$ source vis_python/bin/activate
```
2. Clone Git Rep
```bash
$ mkdir git-talk-vis
$ git clone https://jugit.fz-juelich.de/c.faber/talk-data-visualisation-with-plotly.git git-talk-vis
```
3. Install all libraries
```bash
$ pip install -r git-talk-vis/requirements.txt
```
4. Run the examples
Use the IDE of your choice and run the examples in `git-talk-vis` or use
```bash
$ python git-talk-vis/<file>.py
```
and open the browser http://0.0.0.0:8051.
> **Note:** In some browsers you have to replace `0.0.0.0` with `localhost`.
## Examples
There are in total three examples to explore.
1) First visualisaton, using pandas & plotly
2) Interactive Plots, introduction to callbacks
3) Plot Inference, creation of graphs including the interaction with other graphs
A minimal working example for `Dash` can be found under `Ex0`.
## Data
Monthly mean air temperature for Germany. From 1881 to 2023 with with subdivision into the individual federal states.
**Source:** Deutscher Wetterdienst ([download](https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/))
## Contact
If you have further questions or ideas do not hesitate to write me a [mail](mailto:c.faber@fz-juelich.de).
-------------------------------------------------------
Climate Data - Ex 1 & Ex 2
-------------------------------------------------------
Source: Deutscher Wetterdienst
https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/
bind-addr: 0.0.0.0
port: 8080
auth: none
cert: false
trusted-origins: false
app-name: "Data Visualisation with Plotly"
\ No newline at end of file
alias python=python3
mkdir -p /root/.local/share/code-server/User
jq -n '{"remote.autoForwardPorts": false, "git.openRepositoryInParentFolders": "never", "python.defaultInterpreterPath":"/usr/local/bin"}' > /root/.local/share/code-server/User/settings.json
code-server --config="/app/git-talk-vis/docker_coder_config.yaml" --disable-getting-started-override --disable-proxy /app/git-talk-vis
\ No newline at end of file
from dash import Dash, html
import dash, plotly
app = Dash(__name__)
server = app.server
app.layout = html.Div(
children=[
html.H1("Example 0: Minimal Working Example"),
html.Span(
f"You are using Dash {dash.__version__} and Plotly {plotly.__version__}."
),
]
)
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port="8051")
# Dash
from dash import Dash, html, dcc
# Plotting
import plotly.express as px
import plotly.graph_objects as go
# Importing Data
import pandas as pd
from urllib.request import urlretrieve
import os
# Download Data if necessary
for i in range(1, 13):
if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
urlretrieve(
f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
f"data/regional_averages_tm_{i:02}.txt",
)
# Save data in a single pandas Dataframe
data = [
pd.read_csv(
f"data/regional_averages_tm_{i:02}.txt",
sep=";",
header=1,
usecols=[j for j in range(0, 19)],
)
for i in range(1, 13)
]
df = pd.concat(data)
df["date"] = pd.to_datetime(dict(year=df.Jahr, month=df.Monat, day=1))
# Creating Figures
fig1 = px.line(
df.sort_values(by="date"),
x="date",
y="Deutschland",
hover_data=["Nordrhein-Westfalen", "Brandenburg/Berlin"],
)
fig2 = go.Figure(
data=px.scatter(
df.groupby(["Jahr"], as_index=False).mean(),
x="Jahr",
y="Deutschland",
hover_data={"Nordrhein-Westfalen": ":.2f", "Brandenburg/Berlin": True},
trendline="ols",
).data
+ px.line(
df.groupby(["Jahr"], as_index=False).mean(),
x="Jahr",
y="Deutschland",
).data
)
# Creating Webpage/Dashboard
app = Dash(__name__)
app.layout = html.Div(
children=[
html.H1("Example 1: First Visualisation"),
dcc.Graph(figure=fig1),
dcc.Graph(figure=fig2),
]
)
# Start Server
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port="8051")
# Dash
from dash import Dash, html, dcc, Input, Output, callback
# Plotting
import plotly.express as px
import plotly.graph_objects as go
# Importing Data
import pandas as pd
from urllib.request import urlretrieve
import os
# Download Data if necessary
for i in range(1, 13):
if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
urlretrieve(
f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
f"data/regional_averages_tm_{i:02}.txt",
)
# Save data in a single pandas Dataframe
data = [
pd.read_csv(
f"data/regional_averages_tm_{i:02}.txt",
sep=";",
header=1,
usecols=[j for j in range(0, 19)],
)
for i in range(1, 13)
]
df = pd.concat(data)
df["date"] = pd.to_datetime(dict(year=df.Jahr, month=df.Monat, day=1))
# Creating Webpage/Dashboard
app = Dash(__name__)
app.layout = html.Div(
children=[
html.H1("Example 2: Interactive Plots"),
dcc.Dropdown(
id="dropdown_state", options=df.columns[2:18], value="Brandenburg/Berlin"
),
dcc.Graph(id="fig_mean_T"),
]
)
@callback(Output("fig_mean_T", "figure"), Input("dropdown_state", "value"))
def update_figure(selected_column):
fig1 = px.line(
df.groupby(["Jahr"], as_index=False).mean(),
x="Jahr",
y=selected_column,
)
fig2 = px.line(
df.groupby(["Jahr"], as_index=False).mean(),
x="Jahr",
y="Deutschland",
)
fig2.update_traces(line_color="red")
return go.Figure(data=fig1.data + fig2.data)
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port="8051")
# Dash
from dash import Dash, html, dcc, Input, Output, callback
# Plotting
import plotly.express as px
import plotly.graph_objects as go
# Importing Data
import pandas as pd
from urllib.request import urlretrieve
import os
# Download Data if necessary
for i in range(1, 13):
if not os.path.isfile(f"data/regional_averages_tm_{i:02}.txt"):
urlretrieve(
f"https://opendata.dwd.de/climate_environment/CDC/regional_averages_DE/monthly/air_temperature_mean/regional_averages_tm_{i:02}.txt",
f"data/regional_averages_tm_{i:02}.txt",
)
# Save data in a single pandas Dataframe
data = [
pd.read_csv(
f"data/regional_averages_tm_{i:02}.txt",
sep=";",
header=1,
usecols=[j for j in range(0, 19)],
)
for i in range(1, 13)
]
df = pd.concat(data)
# Calculation of the reference temperature
period: list[int] = [1991, 2020]
df2 = (
df[(df["Jahr"] >= period[0]) & (df["Jahr"] <= period[1])]
.groupby(["Monat"], as_index=False)
.mean()
)
# Melting the dataframe and adding reference temperature
df = df.melt(
id_vars=["Jahr", "Monat", "Deutschland"], var_name="Bundesland", value_name="T"
)
df["reference_T"] = [
df2[df2["Monat"] == x["Monat"]][x["Bundesland"]].iloc[0] for x in df.iloc
]
# Scatter Plot
fig1_sub1 = px.scatter(
df, x="reference_T", y="T", hover_data=["Bundesland", "Monat", "Jahr"]
)
fig1_sub2 = px.line(x=[-2, 20], y=[-2, 20])
fig1 = go.Figure(data=fig1_sub1.data + fig1_sub2.data)
fig1.update_layout(
yaxis_title="Temperature [°C]", xaxis_title="Reference Temperature [°C]"
)
app = Dash(__name__)
app.layout = html.Div(
children=[
html.H1("Example 3: Plot Inference"),
dcc.Graph(id="scatter", figure=fig1),
dcc.Graph(id="histo"),
]
)
@callback(Output("histo", "figure"), Input("scatter", "selectedData"))
def update_histo(selection: dict) -> go.Figure:
if selection is not None:
sel_df = df.iloc[[p["pointIndex"] for p in selection["points"]]]
fig2 = px.histogram(sel_df, "Jahr")
return fig2
else:
return go.Figure()
if __name__ == "__main__":
app.run(debug=True, host="0.0.0.0", port="8051")
Dash
pandas
statsmodels
\ No newline at end of file
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment