Skip to content
GitLab
Projects Groups Topics Snippets
  • /
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • MLAir MLAir
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributor statistics
    • Graph
    • Compare revisions
  • Issues 90
    • Issues 90
    • List
    • Boards
    • Service Desk
    • Milestones
  • Merge requests 9
    • Merge requests 9
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Artifacts
    • Schedules
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Packages and registries
    • Packages and registries
    • Container Registry
    • Model experiments
  • Monitor
    • Monitor
    • Incidents
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Repository
  • Wiki
    • Wiki
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • esdeesde
  • machine-learning
  • MLAirMLAir
  • Issues
  • #393

era5 data

era5 data loading

Create either a data handler that can load era5 data or implement a fully new loading method that is used to load the data.

collection of tasks

  • use BallTree from sklearn to calculate nearest neighbor https://stackoverflow.com/questions/61952561/how-do-i-find-the-neighbors-of-points-containing-coordinates-in-python
  • using haversine distance metric: https://scikit-learn.org/stable/modules/generated/sklearn.metrics.pairwise.haversine_distances.html?highlight=haversine#sklearn.metrics.pairwise.haversine_distances
  • for now: use "nearest" option from xarrays sel command (a more elaborated neighboring or interpolation method can be implemented later)

todos

  • check inside DataHandlerSingleStation for data origin. If origin not equal to era5 use the join downloader. Otherwise use new era5 data loading
  • implement era5 data loading method
    • use station coordinate to find nearest grid cell
    • load all data for given variables (maybe in given time range)
    • store locally as .nc (as for join data)
    • return data and meta
  • be able to handle mixed data origins: e.g. chem with None origin and meteo with era5 origin.
    • either use two separate files or combine files. Think about what is best option?
  • inspect if meta data check works still properly which triggers new a download or just loading from disk
Edited Jul 12, 2022 by lukas leufen
Assignee
Assign to
Time tracking