Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added data/sts/semeval-sts/2017 #44

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions data/sts/semeval-sts/2017/LICENSE.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
The evaluation sets for STS 2017 are distributed under a Creative Commons Attribution-ShareAlike 4.0 International License:

https://creativecommons.org/licenses/by-sa/4.0/

-----------------------------------------------------
The Stanford Natural Language Inference (SNLI) Corpus
-----------------------------------------------------
STS Tracks 1, 2, 3, 4a, 5 and 6 are based on data drawn from the SNLI corpus.

The Stanford Natural Language Inference Corpus by The Stanford NLP Group is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. Based on a work at http://shannon.cs.illinois.edu/DenotationGraph/.

https://creativecommons.org/licenses/by-sa/4.0/

-------------------
WMT News Commentary
-------------------
STS Track 4b is based on data drawn from the WMT News Commentary corpus from the annual Conference on Machine Translation (WMT), formally the Workshop on Statistical Machine Translation.
54 changes: 54 additions & 0 deletions data/sts/semeval-sts/2017/README.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
SemEval 2017 Shared Task 1
Semantic Textual Similarity (STS)

This package contains the test sets for the 2017 Semantic Textual Similarity
(STS) shared task. Each evaluation set has the following tab-separated format:

* One STS pair per line.
* Each line contains the following fields: STS Sent1, STS Sent2

The input files are provided in UTF-8.

Example:

Un perro salta sobre los obstáculos para un show canino. A dog jumps over something.

One file is provided for each track of STS 2017, with the exception of track 4.
Track 4 is subdivided into two subtracks, one for data drawn from SNLI and
another from data sourced from WMT news data. All other datasets were sourced from
SNLI.

Input Files (UTF-8):
--------------------
STS.input.track1.ar-ar.txt
STS.input.track2.ar-en.txt
STS.input.track3.es-es.txt
STS.input.track4a.es-en.txt
STS.input.track4b.es-en.txt
STS.input.track5.en-en.txt
STS.input.track6.tr-en.txt

Output Files:
-------------
For each evaluation set, please generate a plain text output file with one line
for each STS pair that provides the score assigned by your system as a floating
point number:

0.1
4.9
3.5
2.0
5.1

Release History
---------------
v1.0 Jan 16, 2017 - Initial release of the test set
v1.1 Jan 18, 2017 - Correction to track 1, Arabic-Arabic STS

Output files should be uploaded to the STS 2017 CodaLab site:

https://competitions.codalab.org/competitions/16051

Each team can submit up to three runs.

Good Luck!
250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track1.ar-ar.tsv

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track2.ar-en.tsv

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track3.es-es.tsv

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track4a.es-en.tsv

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track4b.es-en.tsv

Large diffs are not rendered by default.

250 changes: 250 additions & 0 deletions data/sts/semeval-sts/2017/track5.en-en.tsv

Large diffs are not rendered by default.

500 changes: 500 additions & 0 deletions data/sts/semeval-sts/2017/track6.tr-en.tsv

Large diffs are not rendered by default.