This file documents the specific experiments
BEWARE: Default runs prior to 23.01.2023 used defaults of small network
- varying learning rate
- 2023-01-13 03:19 & 12:32 & 21:44:50 German128 (unfixed)
- default values from startup
- lr 1e-2 to 1e-06
- 2023-01-13 21:43 SimpleGerman128 (unfixed) crashed
- same as above, but two layers
- lr 1e-4
- 2023-01-13 03:19 & 12:32 & 21:44:50 German128 (unfixed)
- varying lr, beta, gamma
- 2023-01-15 19:56 German128 (unfixed)
- learning rates 1e-2, 1e-4, 1e-6, 1e-8
- beta 1,4,10
- gamma 1,10,20
- 2023-01-18 02:05 Wikipedia128 (unfixed)
- Vergleich mit Optimum von 2023-01-15 19:58
- learning rate 1e-4
- beta 1, 4
- gamma 10, 20
- 2023-01-15 19:56 German128 (unfixed)
- pin down lr
- 2023-01-18 02:10 German128 (unfixed)
- Learning rate einkesseln, Losses in Tensorboard ausklamüsern
- learning rate 1e-3,1e-4,1e-5
- beta 1
- gamma 10, 20
- capacity 0, 0.1, 1
- Result: lr 1e-4 scheint Sieger zu sein
- 2023-01-20 20:44:29 German128 (unfixed)
- test, ob gutes Ergebnis wirklich gut war oder nur zufall
- lr 1e-3, 1e-4, 1e-5
- beta 1, 4, 10
- capacity 0
- gamma 1, 10, 20
- epochs 25
- 2023-01-18 02:10 German128 (unfixed)
- winner run
- 2023-01-19 16:13 German128 (unfixed) lab36
- Learning rate Sieger auf langem lauf testen
- learning rate 1e-4
- beta 1
- gamma 0, 10
- capacity 0, 10
- epochs 100
- 2023-01-20 17:38:26 German128 (unfixed)
- Learning rate Sieger, Sehr langer lauf
- lr 1e-4
- beta 1
- capacity 0
- gamma 10
- epochs 300
- Result: kein starkes Overfitting erkennbar
- 2023-01-19 16:13 German128 (unfixed) lab36
- fixed winner
- 2023-01-23 23:23:47 German128 lab33
- Learning rate Sieger. Dieses Mal mit korrektem Datensatz von 128 Länge
- learning rate 1e-4
- beta 1
- gamma 0, 10
- capacity 0, 10
- epochs 25
- 2023-01-24_00:40:07 German128 lab36
- very, very large epoch number
- learning rate 1e-4
- beta 1
- gamma 10
- capacity 0
- epochs 1000
- 2023-01-24 17:24:18 German128 lab33
- Learning rate Sieger. Erhöhe die Layers auf 2
- learning rate 1e-4
- beta 1
- gamma 0, 10
- capacity 0, 10
- nlayers 2
- epochs 25
- result: no real improvement
- 2023-01-25 18:24:21 German128 lab35
- Siegerwerte. Erhöhe die Layers auf 2. Werte von Controllable
- d-model 256
- z-dim 256
- d-hid 1024
- learning rate 1e-4
- beta 1
- gamma 10
- capacity 0
- nlayers 1
- epochs 25
- 2023-01-25 18:51:55 German128 lab32
- Siegerwerte. Loss beinhaltet [PAD]
- lr 1e-4
- beta 1
- capacity 0
- gamma 10
- epochs 25
- 2023-01-23 23:23:47 German128 lab33
- Word level sampling
- 2023-02-01 22:00:53 German128 lab34
- Siegerwerte. Größere Batchsize, metrics on subset of 10 batches
- Word-level sampling
- lr 1e-4
- beta 1
- capacity 0
- gamma 10
- epochs 25
- batch-size 64
- 2023-02-02 00:00:10 German128 lab34
- Siegerwerte. Größere Batchsize, metrics on subset of 1/4 batches. Default linear KL annealing
- Word-level sampling
- lr 1e-4
- capacity 0
- gamma 10
- epochs 25
- batch-size 64
- 2023-02-02 00:05:43 German128 lab35
- Siegerwerte. Größere Batchsize, metrics on whole set. Default linear KL annealing
- Word-level sampling
- lr 1e-4
- capacity 0
- gamma 10
- epochs 25
- batch-size 64
- 2023-02-02 15:24 German128 lab34 Default linear KL
- Siegerwerte. langer run
- epochs 100
- batch-size 64
- 2023-02-01 22:00:53 German128 lab34
- Parameter hunt (delta not fixed!!!)
- 2023-02-03 16:26:21 German128 lab36
- Parameter search
- epochs 50
- lr 1e-3, 1e-4, 1e-5, 1e-6
- gamma 0.1, 1, 10
- capacity 0, 1, 10
- delta 0.1, 1, 10 (1)
- batch-size 64
- 2023-02-03 16:26:21 German128 lab36
- Counterfactual parameters (delta not fixed!!!)
- 2023-02-06 18:13:25 One layer (unregularized)
- 2023-02-06 18:15:01 Two layers (unregularized)
- 2023-02-08 02:01:47 One layer (regularized)
- 2023-02-08 02:01:50 Two layers (regularized)
- 2023-02-11 00:12:50 One/Two layers (regularized)
- 2023-02-11 00:12:59 One/Two layers (regularized)
- 2023-02-11 00:13:17 One/Two layers (unregularized)
- 2023-02-11 00:13:18 One/Two layers (unregularized)
- kl_Ms 4
- No regularization used
- d_model, z_dim 256
- d_hid 1024
- nlayers 1
- lr 1e-4
- capacity 0
- gamma 10
- delta 0 (1)
- Comparison for Counterfactualparameters (delta not fixed!!!)
- 2023-02-11 13:10:37 One/Two 2x Reg
- 2023-02-11 13:10:47 One/Two 2x NoReg
- 2023-02-11 13:10:58 One/Two Reg
- 2023-02-11 22:34:18 One/Two NoReg
- kl_Ms 2
- Rest akin to Counterparameters
- Various Hyperparameters / Variants (delta not fixed!!!)
- 2023-02-14 16:50:27 Alpha variation
- 2023-02-14 17:12:18 Alpha / Activation func on latent
- 2023-02-14 17:17:07 Alpha Large Values / Activation func on latent
- 2023-02-14 23:21:04 Beta Small
- Hyperparameter Wikipedia (delta not fixed!!!)
- 2023-02-15 00:06:05
- 2023-02-15 00:06:21
- 2023-02-15 00:07:02
- 2023-02-15 12:13:39
- 2023-02-15 12:13:47
- 2023-02-15 12:14:19
- 2023-02-15 21:59:06
- 2023-02-15 21:59:25
- 2023-02-15 21:59:42
- Hyperparameter German
- 2023-02-17 17:12:51 lr 1e-4
- 2023-02-17 17:13:16 lr 1e-5
- 2023-02-21 14:45:53 lr 1e-4
- 2023-02-21 14:46:01 lr 1e-5
- Hyperparameter German
- 2023-02-25 16:23:09 lr 1e-3
- 2023-02-25 16:23:17 lr 1e-4
- 2023-02-25 16:23:29 lr 1e-5
- 2023-02-27 00:09:51 lr 1e-3 NoReg
- 2023-02-27 00:10:20 lr 1e-4 NoReg
- 2023-02-27 00:11:22 lr 1e-5 NoReg
- beta 1, 1e-2, 1e-4
- capacity 0, 1, 10
- gamma 0.1, 1, 10
- delta 1
- klM 2
- epochs 25
- Result:
- 02-26 12:01 - 16:53 lr 1e-3 good accuracy on train/val
- Hyperparameter German Batchsize
- 2023-02-28 21:04:56 Beta 1e-3
- 2023-03-09 18:10:42 Beta 1e-3 2x
- 2023-03-14 23:24:55 Beta 1e-3 NoReg 3x ~100h
- 2023-02-28 21:05:04 Beta 1e-4
- 2023-03-09 18:10:56 Beta 1e-4 2x
- 2023-03-14 23:25:24 Beta 1e-4 NoReg 3x ~100h
- lr 1e-4
- batchsize 8, 64
- capacity 0, 1, 10
- gamma 1
- delta 1
- klM 4
- epochs 50
- Hyperparameter Wiki, filtered by previous german run (WRONG DATASET!)
- 2023-02-28 21:05:17 Beta 1e-3
- 2023-03-09 18:11:08 Beta 1e-3 2x
- 2023-02-28 21:05:27 Beta 1e-4
- 2023-03-09 18:11:28 Beta 1e-4 2x
- 2023-03-08 01:07:57 Beta 1e-3 NoReg
- 2023-03-14 23:25:29 Beta 1e-3 NoReg 3x ~50h
- 2023-03-08 01:08:26 Beta 1e-4 NoReg
- 2023-03-14 23:25:03 Beta 1e-4 NoReg 3x ~50h
- lr 1e-4
- batchsize 64
- capacity 0, 1, 10
- gamma 1
- delta 1
- klM 4
- epochs 50
- Explicit initialization German
- 2023-03-02 22:52:17
- used Xavier explicitly
- lr 1e-4
- batchsize 8, 64
- capacity 0, 1, 10
- gamma 1
- delta 1
- klM 4
- epochs 50
- 2023-03-02 22:52:17
- Sampling Level comparison
- 2023-03-19 00:01:39 Wordlevel 3x ~7h
- 2023-03-19 00:19:19 Sentencelevel 3x ~5.5h
- 2023-03-19 19:02:52 Wiki Wordlevel 3x ~70h abgebrochen
- 2023-03-22 15:45:55 Wiki Wordlevel 3x ~70h abgebrochen
- 2023-03-28 20:26:05 Wiki Wordlevel 3x ~41.5h
- 2023-03-19 19:02:52 Wiki Sentence 3x ~70h abgebrochen
- 2023-03-30 ??:??:?? Wiki Sentence 3x ~70h abgebrochen
- 2023-03-30 22:04:57 Wiki Sentence 3x ~31h
- lr 1e-4
- batchsize 64
- beta 1e-3
- capacity 0
- gamma 1
- delta 1
- klM 4
- epochs 50
- Expand Attributes One Layer (Coding mistake)
- Wiki 3x ~37h
- German 3x ~6.5h
- OneAttr Vary layers
- German One layer 3x ~6.6h
- Wiki One layer 3x ~31h
- German Two layers 3x ~6.5h
- Wiki Two layers 3x ~43h
- German Three layers 3x ~7.5h
- Wikipedia Three layers 3x ~41.5h
- Hyper Wiki
- 2023-03-31 19:58:49 lr 1e-3 ~54h with errors!
- 2023-03-31 20:04:15 lr 1e-4 ~62h
- 2023-04-01 13:12:17 lr 1e-5 cancelled last two
- beta 1e-3, 1e-4
- 2023-03-31 20:06:21 lr 1e-4 ~62h
- beta 1e-1, 1e-2
- lr 1e-4
- batchsize 64
- capacity 0
- gamma 0.1, 1, 10
- delta 1
- klM 4
- epochs 50
- Large Gamma
- 2023-04-03 14:49:28 German ~
- gamma 100, 1000
- 2023-04-03 14:50:43 Wikipedia ~
- gamma 100
- 2023-04-03 14:50:55 Wikipedia ~
- gamma 1000
- beta 1e-3
- lr 1e-4
- batchsize 64
- capacity 0
- delta 1
- klM 4
- epochs 50
- 2023-04-03 14:49:28 German ~
- Wiki No Reg
- 2023-04-03 15:27:28
- Expand Attributes
- 2023-04-04 15:05:06 Wiki
- 2023-04-04 15:05:37 Wiki
- 2023-04-04 15:06:01 Wiki
- 2023-04-04 15:06:56 German x3
- beta 1e-3
- gamma 1
- lr 1e-4
- delta 1
- klM 4
- batchsize 64
- capacity 0
- epochs 50
- Saving model
- wikipedia 3 layers ~14h
- wikipedia 2 layers ~12h
- wikipedia 1 layer ~10.5h
- German 1-3 layers ~8h
DELETED ALL PREVIOUS RUNS
-
2023-01-13 21:45 Wikipedia128 (unfixed) deleted
- learning rates 1e-2 to 1e-7
- Result: Mem errors, etc pp
-
2023-01-13 19:13 German128 (unfixed) deleted
- learning rates 1e-4
- beta
- gamma
-
2023-01-23 23:37:40 Wikipedia128 lab35
- Learning rate Sieger. Dieses Mal mit korrektem Datensatz von 128 Länge
- test, ob Wikipedia auch das gleiche Ergebnis gibt
- learning rate 1e-4
- beta 1
- gamma 0, 10
- capacity 0, 10
- epochs 25
-
2023-01-28 19:51:21 German128 lab36
- Älterer Run. Test ob NaN auftaucht. see
error
- lr 1e-2
- beta 1, 4, 10
- gamma 1, 10, 20
- Älterer Run. Test ob NaN auftaucht. see
-
2023-02-05 15:10:03 German128 lab33
- Siegerwerte, unregularized
- lr 1e-4
- capacity 0
- gamma 10
- epochs 100
- batch-size 64
-
2023-02-05 18:18:03 Wikipedia lab32
- Siegerwerte
- epochs 50
- batch-size 64
- 2023-02-17 Hyperparametersuche
- KL annealing wurde angepasst auf
- bigger network on good parameters
- without memory_mask