Evaluation MEN whole dataset #7

atnikos · 2018-05-03T20:33:06Z

@georgepar follow below script to extract dataset and i will fix a function that given a word return the its embeddings.In weights folder you will find the trained weights for each of three experiments.The embeddings which will be used are glove42B.300d

import numpy as np

from sklearn.datasets.base import Bunch
from .utils import _get_as_pd

def fetch_MEN(which="all", form="natural"):
    """
    Fetch MEN dataset for testing similarity and relatedness
    ----------
    which : "all", "test" or "dev"
    form : "lem" or "natural"
    Returns
    -------
    data : sklearn.datasets.base.Bunch
        dictionary-like object. Keys of interest:
        'X': matrix of 2 words per column,
        'y': vector with scores
    Published at http://clic.cimec.unitn.it/~elia.bruni/MEN.html.

    """
    if which == "dev":
        data = _get_as_pd('https://www.dropbox.com/s/c0hm5dd95xapenf/EN-MEN-LEM-DEV.txt?dl=1',
                          'similarity', header=None, sep=" ")
    elif which == "test":
        data = _get_as_pd('https://www.dropbox.com/s/vdmqgvn65smm2ah/EN-MEN-LEM-TEST.txt?dl=1',
                          'similarity/EN-MEN-LEM-TEST', header=None, sep=" ")
    elif which == "all":
        data = _get_as_pd('https://www.dropbox.com/s/b9rv8s7l32ni274/EN-MEN-LEM.txt?dl=1',
                          'similarity', header=None, sep=" ")
    else:
        raise RuntimeError("Not recognized which parameter")

    if form == "natural":
        # Remove last two chars from first two columns
        data = data.apply(lambda x: [y if isinstance(y, float) else y[0:-2] for y in x])
    elif form != "lem":
        raise RuntimeError("Not recognized form argument")

return Bunch(X=data.values[:, 0:2].astype("object"), y=data.values[:, 2:].astype(np.float) / 5.0)

atnikos assigned atnikos and georgepar May 3, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation MEN whole dataset #7

Evaluation MEN whole dataset #7

atnikos commented May 3, 2018 •

edited

Loading

Evaluation MEN whole dataset #7

Evaluation MEN whole dataset #7

Comments

atnikos commented May 3, 2018 • edited Loading

atnikos commented May 3, 2018 •

edited

Loading