Google Research Datasets
- 858 followers
- Mountain View, CA
- http://research.google
Pinned Loading
Repositories
- GeniL Public
GeniL dataset is an effort for detecting various types of generalization in language. This multilingual dataset covers sentences in EN, FR, ES, PT, AR, HI, BN, MS, and ID and is annotated by native speakers of each language. Each sentence is collected from a public corpora of language and contains at least one identity group name and an attribute.
google-research-datasets/GeniL’s past year of commit activity - uicrit Public
UICrit is a dataset containing human-generated natural language design critiques, corresponding bounding boxes for each critique, and design quality ratings for 1,000 mobile UIs from RICO. This dataset was collected for our UIST '24 paper: https://arxiv.org/abs/2407.08850.
google-research-datasets/uicrit’s past year of commit activity - tap-typing-with-touch-sensing-images Public
The Tap Typing with Touch Sensing Images (TSI) dataset contains data of user taps on a mobile touchscreen keyboard, including elliptical features and capacitive sensing images of the taps. The dataset aligns each tap with a key the user intended to type during data collection so it can be used for keyboard decoder training and/or evaluation.
google-research-datasets/tap-typing-with-touch-sensing-images’s past year of commit activity - adversarial-nibbler Public
This dataset contains results from all rounds of Adversarial Nibbler. This data includes adversarial prompts fed into public generative text2image models and validations for unsafe images. There will be two sets of data: all prompts submitted and all prompts attempted (sent to t2i models but not submitted as unsafe).
google-research-datasets/adversarial-nibbler’s past year of commit activity - C4_200M-synthetic-dataset-for-grammatical-error-correction Public
This dataset contains synthetic training data for grammatical error correction. The corpus is generated by corrupting clean sentences from C4 using a tagged corruption model. The approach and the dataset are described in more detail by Stahlberg and Kumar (2021) (https://www.aclweb.org/anthology/2021.bea-1.4/)
google-research-datasets/C4_200M-synthetic-dataset-for-grammatical-error-correction’s past year of commit activity - sanpo_dataset Public
google-research-datasets/sanpo_dataset’s past year of commit activity - SeeGULL-Multilingual Public
SeeGULL Multilingual is a multilingual and multicultural dataset of stereotypes. It consists of stereotypes in 20 languages with human annotations across 23 languages, including annotations on their degree of offensiveness.
google-research-datasets/SeeGULL-Multilingual’s past year of commit activity - ToTTo Public
ToTTo is an open-domain English table-to-text dataset with over 120,000 training examples that proposes a controlled generation task: given a Wikipedia table and a set of highlighted table cells, produce a one-sentence description. We hope it can serve as a useful research benchmark for high-precision conditional text generation.
google-research-datasets/ToTTo’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.