Skip to content
Tiago Sanona edited this page Mar 21, 2021 · 21 revisions

Project - Evaluation:

Experiments

Operators:

  • SGC (l + nl), GC (l + nl), AGC, ASGC, ASGCP, GlobalSLC

Hyperparameters:

  • batch-size: 16
  • bottleneck-channels: 128
  • spatial-channels: 96
  • dropout: 0.1
  • dropout-att: 0.5 (TBD)
  • forecast-horizon: 3, 6 separately

Datasets:

  • METR-LA
  • PEMS-BAY

Report Outline

Introduction:

  • Context
    • Spatio-temporal learning traffic networks
    • Inter- & Intra- series correlation; link this to spatial and temporal correlations/dependencies
    • ARIMA -(highly non linear dynamics & inter-series correlations)-> GNNs --> GNNs that learn graph structure
    • Inter series correlation:
      • prior graph based on road; connected (by roads) Nodes influence each other
      • learned based on patterns in data (not expressed the road network)
  • Issue
    • Misrepresentation of capabilities of certain mechanisms
  • Problem
    • Zhang et al. and Chao et al. do both learn laplacian; but with different mechanisms and the impact these mechanisms have is not clear
  • Others
    • Usually they to propose Ablation studies to show effectiveness
    • Compare their Architecture to competing architectures
    • Only whole architectures are compared & not components
    • Effectiveness of components is only shown in own temporal model
      • TODO: find out how temporal modeling is done in Chao et al. (maybe with DFT)
  • Novelty
    • Contrast to others
    • Experiment with convolutions in fixed temporal framework
    • Compare components rather than models
  • Challenges
    • Temporal modeling --> Main cause for lack in performance convolution operations have a fixed math. definition and therefore require no engineering or tuning. If implemented correctly there is nothing to tune about conv. operators.
    • Structuring 8 convolution kernels;
  • Approach
    • Similarly as in abstract
  • Contributions
    • list all novel implementations: GC-l, ASGCP, ASGC, AGC
    • one sentence reffering to repo --> gives one place for all implementation

Scenario:

  • Benefits from our approach:
  • Insight on which mechanisms for learning a graph structure might be well suited for spatial modelling; without discussing temporal modelling
  • For example: researchers know how to build their convolution kernels for spatio-temporal problems
  • Central GitHub repo for lots of kernels and temporal models.
  • Dataset:
  • Descriptive metrics; #Sensors (Nodes), time-frame (e.g. aggregated over 5 min intervals), Time-Frame (Mar 1st 2012 - Jun 30th 2012), Caltrans PeMS (Performance Measuring System)
  • Features: Signal (Mph), Timestamp (Cyclical) over 1 day

Preliminaries:

  • Graph Convolution Spatial vs. Spectral on a high-level and some history (Kipf & Welling)
  • Spatial Convolution (GC)
  • Spectral Convolution (SGC)
  • Structure Learning with Parameters Zhang et al. combine SLCs which do learn global and local graph representations. State that we only cover Global view and not local view.
  • Define spectral convolution with learnable laplacian (GC-l, SGC-l)
  • Latent Correlation Layer Explain Attention mechanism on a high level.
  • Define spectral and spatial convolution kernels (AGC, ASGC)
  • Combinations
  • Global SLC
  • ASGCP
  • Temporal Model (image of P3D with substitutable Graph Convolution)
  • Table with all Kernels listed.

Threats to validity: (Ask Chris for Feedback)

  • External Validity
  • Only tested on Traffic Prediction (based on speed readings)
  • Model might not generalize for different types of road (inner city vs highway, big city vs small city).
  • Long forecast horizons not tested
  • Same country
  • Internal Validity
  • Hyperparameter tuning on Val Dataset
  • Structure Learning vs. #Parameters
  • Construct Validity
  • RMSE vs MAE vs MAPE; these are all valid in traffic forecasting;
  • Conclusion Validity
  • Confirm not prove;
  • Performance P3D not in general
  • Avoid by having at least 2 samples for each concept
  • 2 Datasets

Aproach

  • Highlevel description of the work:
    • Measure the effect of learnable laplacian vs pre-defined based on human knowledge.
    • we do the above with spectral convolution and spacial convolution.
    • table with with model names and their explanations.
  • summary of the next subsections.
  • Model:
    • start by introducing the architecture
      • Linear layer
      • Dropout
      • P3D:
        • Blocks A, B, C; Downsample, spacial and temporal convolution, Upsample, Batch Norm, Relu
        • Upsample
      • Graph convolution(s)
  • Combining convolution operators:
    • Recapping Global SLC
    • The idea of substituting the dynamic part of SLC by attention
    • recap ASGCP
  • Experimental Setup:
    • Basically talk about the config files
    • Hyper params (batch size, bottle neck chans, learning rate,...)
    • Forcast horizon
    • Datasets
    • Loss function
    • Train, val test split (70, 10, 20) (split in time why?)
    • Explain that we pick the model parameters by the best val resoults
Clone this wiki locally