Skip to content

Latest commit

 

History

History
368 lines (353 loc) · 10.7 KB

experiment_diary.md

File metadata and controls

368 lines (353 loc) · 10.7 KB

Runs

This file documents the specific experiments

BEWARE: Default runs prior to 23.01.2023 used defaults of small network

Named saves

  • varying learning rate
    • 2023-01-13 03:19 & 12:32 & 21:44:50 German128 (unfixed)
      • default values from startup
      • lr 1e-2 to 1e-06
    • 2023-01-13 21:43 SimpleGerman128 (unfixed) crashed
      • same as above, but two layers
      • lr 1e-4
  • varying lr, beta, gamma
    • 2023-01-15 19:56 German128 (unfixed)
      • learning rates 1e-2, 1e-4, 1e-6, 1e-8
      • beta 1,4,10
      • gamma 1,10,20
    • 2023-01-18 02:05 Wikipedia128 (unfixed)
      • Vergleich mit Optimum von 2023-01-15 19:58
      • learning rate 1e-4
      • beta 1, 4
      • gamma 10, 20
  • pin down lr
    • 2023-01-18 02:10 German128 (unfixed)
      • Learning rate einkesseln, Losses in Tensorboard ausklamüsern
      • learning rate 1e-3,1e-4,1e-5
      • beta 1
      • gamma 10, 20
      • capacity 0, 0.1, 1
      • Result: lr 1e-4 scheint Sieger zu sein
    • 2023-01-20 20:44:29 German128 (unfixed)
      • test, ob gutes Ergebnis wirklich gut war oder nur zufall
      • lr 1e-3, 1e-4, 1e-5
      • beta 1, 4, 10
      • capacity 0
      • gamma 1, 10, 20
      • epochs 25
  • winner run
    • 2023-01-19 16:13 German128 (unfixed) lab36
      • Learning rate Sieger auf langem lauf testen
      • learning rate 1e-4
      • beta 1
      • gamma 0, 10
      • capacity 0, 10
      • epochs 100
    • 2023-01-20 17:38:26 German128 (unfixed)
      • Learning rate Sieger, Sehr langer lauf
      • lr 1e-4
      • beta 1
      • capacity 0
      • gamma 10
      • epochs 300
      • Result: kein starkes Overfitting erkennbar

Fixed embedding

  • fixed winner
    • 2023-01-23 23:23:47 German128 lab33
      • Learning rate Sieger. Dieses Mal mit korrektem Datensatz von 128 Länge
      • learning rate 1e-4
      • beta 1
      • gamma 0, 10
      • capacity 0, 10
      • epochs 25
    • 2023-01-24_00:40:07 German128 lab36
      • very, very large epoch number
      • learning rate 1e-4
      • beta 1
      • gamma 10
      • capacity 0
      • epochs 1000
    • 2023-01-24 17:24:18 German128 lab33
      • Learning rate Sieger. Erhöhe die Layers auf 2
      • learning rate 1e-4
      • beta 1
      • gamma 0, 10
      • capacity 0, 10
      • nlayers 2
      • epochs 25
      • result: no real improvement
    • 2023-01-25 18:24:21 German128 lab35
      • Siegerwerte. Erhöhe die Layers auf 2. Werte von Controllable
      • d-model 256
      • z-dim 256
      • d-hid 1024
      • learning rate 1e-4
      • beta 1
      • gamma 10
      • capacity 0
      • nlayers 1
      • epochs 25
    • 2023-01-25 18:51:55 German128 lab32
      • Siegerwerte. Loss beinhaltet [PAD]
      • lr 1e-4
      • beta 1
      • capacity 0
      • gamma 10
      • epochs 25

Word level sampling

  • Word level sampling
    • 2023-02-01 22:00:53 German128 lab34
      • Siegerwerte. Größere Batchsize, metrics on subset of 10 batches
      • Word-level sampling
      • lr 1e-4
      • beta 1
      • capacity 0
      • gamma 10
      • epochs 25
      • batch-size 64
    • 2023-02-02 00:00:10 German128 lab34
      • Siegerwerte. Größere Batchsize, metrics on subset of 1/4 batches. Default linear KL annealing
      • Word-level sampling
      • lr 1e-4
      • capacity 0
      • gamma 10
      • epochs 25
      • batch-size 64
    • 2023-02-02 00:05:43 German128 lab35
      • Siegerwerte. Größere Batchsize, metrics on whole set. Default linear KL annealing
      • Word-level sampling
      • lr 1e-4
      • capacity 0
      • gamma 10
      • epochs 25
      • batch-size 64
    • 2023-02-02 15:24 German128 lab34 Default linear KL
      • Siegerwerte. langer run
      • epochs 100
      • batch-size 64

With KL annealing on batches (Error in kl)

  • Parameter hunt (delta not fixed!!!)
    • 2023-02-03 16:26:21 German128 lab36
      • Parameter search
      • epochs 50
      • lr 1e-3, 1e-4, 1e-5, 1e-6
      • gamma 0.1, 1, 10
      • capacity 0, 1, 10
      • delta 0.1, 1, 10 (1)
      • batch-size 64
  • Counterfactual parameters (delta not fixed!!!)
    • 2023-02-06 18:13:25 One layer (unregularized)
    • 2023-02-06 18:15:01 Two layers (unregularized)
    • 2023-02-08 02:01:47 One layer (regularized)
    • 2023-02-08 02:01:50 Two layers (regularized)
    • 2023-02-11 00:12:50 One/Two layers (regularized)
    • 2023-02-11 00:12:59 One/Two layers (regularized)
    • 2023-02-11 00:13:17 One/Two layers (unregularized)
    • 2023-02-11 00:13:18 One/Two layers (unregularized)
      • kl_Ms 4
      • No regularization used
      • d_model, z_dim 256
      • d_hid 1024
      • nlayers 1
      • lr 1e-4
      • capacity 0
      • gamma 10
      • delta 0 (1)

Sentence level sampling (Error in KL)

  • Comparison for Counterfactualparameters (delta not fixed!!!)
    • 2023-02-11 13:10:37 One/Two 2x Reg
    • 2023-02-11 13:10:47 One/Two 2x NoReg
    • 2023-02-11 13:10:58 One/Two Reg
    • 2023-02-11 22:34:18 One/Two NoReg
      • kl_Ms 2
      • Rest akin to Counterparameters
  • Various Hyperparameters / Variants (delta not fixed!!!)
    • 2023-02-14 16:50:27 Alpha variation
    • 2023-02-14 17:12:18 Alpha / Activation func on latent
    • 2023-02-14 17:17:07 Alpha Large Values / Activation func on latent
    • 2023-02-14 23:21:04 Beta Small
  • Hyperparameter Wikipedia (delta not fixed!!!)
    • 2023-02-15 00:06:05
    • 2023-02-15 00:06:21
    • 2023-02-15 00:07:02
    • 2023-02-15 12:13:39
    • 2023-02-15 12:13:47
    • 2023-02-15 12:14:19
    • 2023-02-15 21:59:06
    • 2023-02-15 21:59:25
    • 2023-02-15 21:59:42

Fixed Delta

  • Hyperparameter German
    • 2023-02-17 17:12:51 lr 1e-4
    • 2023-02-17 17:13:16 lr 1e-5
    • 2023-02-21 14:45:53 lr 1e-4
    • 2023-02-21 14:46:01 lr 1e-5

Fixed kl annealing (Moved all previous runs to archive)

  • Hyperparameter German
    • 2023-02-25 16:23:09 lr 1e-3
    • 2023-02-25 16:23:17 lr 1e-4
    • 2023-02-25 16:23:29 lr 1e-5
    • 2023-02-27 00:09:51 lr 1e-3 NoReg
    • 2023-02-27 00:10:20 lr 1e-4 NoReg
    • 2023-02-27 00:11:22 lr 1e-5 NoReg
      • beta 1, 1e-2, 1e-4
      • capacity 0, 1, 10
      • gamma 0.1, 1, 10
      • delta 1
      • klM 2
      • epochs 25
    • Result:
      • 02-26 12:01 - 16:53 lr 1e-3 good accuracy on train/val
  • Hyperparameter German Batchsize
    • 2023-02-28 21:04:56 Beta 1e-3
    • 2023-03-09 18:10:42 Beta 1e-3 2x
    • 2023-03-14 23:24:55 Beta 1e-3 NoReg 3x ~100h
    • 2023-02-28 21:05:04 Beta 1e-4
    • 2023-03-09 18:10:56 Beta 1e-4 2x
    • 2023-03-14 23:25:24 Beta 1e-4 NoReg 3x ~100h
      • lr 1e-4
      • batchsize 8, 64
      • capacity 0, 1, 10
      • gamma 1
      • delta 1
      • klM 4
      • epochs 50
  • Hyperparameter Wiki, filtered by previous german run (WRONG DATASET!)
    • 2023-02-28 21:05:17 Beta 1e-3
    • 2023-03-09 18:11:08 Beta 1e-3 2x
    • 2023-02-28 21:05:27 Beta 1e-4
    • 2023-03-09 18:11:28 Beta 1e-4 2x
    • 2023-03-08 01:07:57 Beta 1e-3 NoReg
    • 2023-03-14 23:25:29 Beta 1e-3 NoReg 3x ~50h
    • 2023-03-08 01:08:26 Beta 1e-4 NoReg
    • 2023-03-14 23:25:03 Beta 1e-4 NoReg 3x ~50h
      • lr 1e-4
      • batchsize 64
      • capacity 0, 1, 10
      • gamma 1
      • delta 1
      • klM 4
      • epochs 50
  • Explicit initialization German
    • 2023-03-02 22:52:17
      • used Xavier explicitly
      • lr 1e-4
      • batchsize 8, 64
      • capacity 0, 1, 10
      • gamma 1
      • delta 1
      • klM 4
      • epochs 50
  • Sampling Level comparison
    • 2023-03-19 00:01:39 Wordlevel 3x ~7h
    • 2023-03-19 00:19:19 Sentencelevel 3x ~5.5h
    • 2023-03-19 19:02:52 Wiki Wordlevel 3x ~70h abgebrochen
    • 2023-03-22 15:45:55 Wiki Wordlevel 3x ~70h abgebrochen
    • 2023-03-28 20:26:05 Wiki Wordlevel 3x ~41.5h
    • 2023-03-19 19:02:52 Wiki Sentence 3x ~70h abgebrochen
    • 2023-03-30 ??:??:?? Wiki Sentence 3x ~70h abgebrochen
    • 2023-03-30 22:04:57 Wiki Sentence 3x ~31h
      • lr 1e-4
      • batchsize 64
      • beta 1e-3
      • capacity 0
      • gamma 1
      • delta 1
      • klM 4
      • epochs 50
  • Expand Attributes One Layer (Coding mistake)
    • Wiki 3x ~37h
    • German 3x ~6.5h
  • OneAttr Vary layers
    • German One layer 3x ~6.6h
    • Wiki One layer 3x ~31h
    • German Two layers 3x ~6.5h
    • Wiki Two layers 3x ~43h
    • German Three layers 3x ~7.5h
    • Wikipedia Three layers 3x ~41.5h
  • Hyper Wiki
    • 2023-03-31 19:58:49 lr 1e-3 ~54h with errors!
    • 2023-03-31 20:04:15 lr 1e-4 ~62h
    • 2023-04-01 13:12:17 lr 1e-5 cancelled last two
      • beta 1e-3, 1e-4
    • 2023-03-31 20:06:21 lr 1e-4 ~62h
      • beta 1e-1, 1e-2
      • lr 1e-4
      • batchsize 64
      • capacity 0
      • gamma 0.1, 1, 10
      • delta 1
      • klM 4
      • epochs 50
  • Large Gamma
    • 2023-04-03 14:49:28 German ~
      • gamma 100, 1000
    • 2023-04-03 14:50:43 Wikipedia ~
      • gamma 100
    • 2023-04-03 14:50:55 Wikipedia ~
      • gamma 1000
      • beta 1e-3
      • lr 1e-4
      • batchsize 64
      • capacity 0
      • delta 1
      • klM 4
      • epochs 50
  • Wiki No Reg
    • 2023-04-03 15:27:28
  • Expand Attributes
    • 2023-04-04 15:05:06 Wiki
    • 2023-04-04 15:05:37 Wiki
    • 2023-04-04 15:06:01 Wiki
    • 2023-04-04 15:06:56 German x3
      • beta 1e-3
      • gamma 1
      • lr 1e-4
      • delta 1
      • klM 4
      • batchsize 64
      • capacity 0
      • epochs 50
  • Saving model
    • wikipedia 3 layers ~14h
    • wikipedia 2 layers ~12h
    • wikipedia 1 layer ~10.5h
    • German 1-3 layers ~8h

Diary

DELETED ALL PREVIOUS RUNS

  • 2023-01-13 21:45 Wikipedia128 (unfixed) deleted

    • learning rates 1e-2 to 1e-7
    • Result: Mem errors, etc pp
  • 2023-01-13 19:13 German128 (unfixed) deleted

    • learning rates 1e-4
    • beta
    • gamma
  • 2023-01-23 23:37:40 Wikipedia128 lab35

    • Learning rate Sieger. Dieses Mal mit korrektem Datensatz von 128 Länge
    • test, ob Wikipedia auch das gleiche Ergebnis gibt
    • learning rate 1e-4
    • beta 1
    • gamma 0, 10
    • capacity 0, 10
    • epochs 25
  • 2023-01-28 19:51:21 German128 lab36

    • Älterer Run. Test ob NaN auftaucht. see error
    • lr 1e-2
    • beta 1, 4, 10
    • gamma 1, 10, 20
  • 2023-02-05 15:10:03 German128 lab33

    • Siegerwerte, unregularized
    • lr 1e-4
    • capacity 0
    • gamma 10
    • epochs 100
    • batch-size 64
  • 2023-02-05 18:18:03 Wikipedia lab32

    • Siegerwerte
    • epochs 50
    • batch-size 64

KL Annealing on epochs

  • 2023-02-17 Hyperparametersuche
    • KL annealing wurde angepasst auf

To be started

  • bigger network on good parameters
  • without memory_mask