Merging dataframes for GloVe embeddings #160

baubrey · 2023-04-20T22:11:27Z

df = pd.merge(
    base_df, emb_df, left_index=True, right_index=True
)

Doesn't seem to work correctly for GloVe embeddings, because the index of base_df and emb_df are not the same.

(Pdb) base_df
           word       onset      offset  accuracy  ... in_bert-base-cased  in_bert-large-cased  in_roberta-base in_roberta-large
1          okay  13132.0000  13388.0000         1  ...               True                 True             True             True
4         feels  13862.0000  14076.0000         1  ...               True                 True             True             True
5         great  14033.0000  14217.0000         1  ...               True                 True             True             True
6          yeah  14217.0000  14345.0000         1  ...               True                 True             True             True
7          Good  13877.0000  14066.0000         1  ...               True                 True             True             True
...         ...         ...         ...       ...  ...                ...                  ...              ...              ...
79647      that  73547.6624  73609.1024         1  ...               True                 True             True             True
79648        do  73609.1024  73660.3024         1  ...               True                 True             True             True
79649  anything  73660.3024  73798.5424         1  ...               True                 True             True             True
79650       Not  73967.6048  74033.4798         1  ...               True                 True             True             True
79651    really  74084.6798  74248.4385         1  ...               True                 True             True             True

[69152 rows x 44 columns]

(Pdb) emb_df
                                              embeddings
0      [0.19901, -0.77517, -0.11574, -0.35179, 0.4122...
1      [-0.086751, -0.10439, -0.48462, -0.27358, 1.01...
2      [-0.026567, 1.3357, -1.028, -0.3729, 0.52012, ...
3      [-0.80924, -0.030977, 0.5102, -0.75298, 0.4904...
4      [-0.35586, 0.5213, -0.6107, -0.30131, 0.94862,...
...                                                  ...
69147  [0.88387, -0.14199, 0.13566, 0.098682, 0.51218...
69148  [0.29605, -0.13841, 0.043774, -0.38744, 0.1226...
69149  [0.12032, -0.14806, 0.0059001, -0.1513, 0.7347...
69150  [0.55025, -0.24942, -0.0009386, -0.264, 0.5932...
69151  [0.0016675, -0.16376, -0.092648, -0.33466, 0.7...

[69152 rows x 1 columns]

The text was updated successfully, but these errors were encountered:

zkokaja · 2023-05-04T17:56:18Z

to_dict() might be removing index. There are multiple save_pkl functions. This also may be an issue for whisper and base where we remove rows from base_df before generating emb_df and causing a mismatch in indexes.

zkokaja · 2023-05-04T17:57:06Z

See #153

zkokaja assigned hvgazula May 4, 2023

zkokaja transferred this issue from hassonlab/247-encoding May 4, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merging dataframes for GloVe embeddings #160

Merging dataframes for GloVe embeddings #160

baubrey commented Apr 20, 2023

zkokaja commented May 4, 2023

zkokaja commented May 4, 2023

Merging dataframes for GloVe embeddings #160

Merging dataframes for GloVe embeddings #160

Comments

baubrey commented Apr 20, 2023

zkokaja commented May 4, 2023

zkokaja commented May 4, 2023