Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Papers Voting #1

Open
jaderabbit opened this issue May 2, 2020 · 22 comments
Open

Papers Voting #1

jaderabbit opened this issue May 2, 2020 · 22 comments

Comments

@jaderabbit
Copy link
Member

jaderabbit commented May 2, 2020

In this issue you can either:

  • Add papers that you think are interesting to read and discuss (please stick to the format).
  • vote: should be done using 👍 on comments

Example: hadyelsahar#1

@elyesmanai
Copy link

Generalization Through Memorization: Nearest Neighbor Language Models

https://openreview.net/pdf?id=HklBjCEKvH

Short Description:

The authors introduced kNN-LMs, which can significantly outperform standard language models by
directly querying training examples at test time. The approach can be applied to any neural language model. The success of this method suggests that learning similarity functions between contexts may be an easier problem than predicting the next word from some given context

@dadelani
Copy link

dadelani commented May 5, 2020

Massive vs. Curated Word Embeddings for Low-Resourced Languages. The Case of Yorùbá and Twi

https://arxiv.org/abs/1912.02481

Short description:
In this paper, we focus on two African languages, Yorùbá and Twi, and compare the word embeddings obtained from crawled data on the web, with word embeddings obtained from curated corpora and language-dependent processing. We analyse the noise in the publicly available corpora, collect high quality and noisy data for the two languages and quantify the improvements that depend not only on the amount of data but on the quality too. We also use different architectures that learn word representations both from surface forms and characters to further exploit all the available information which showed to be important for these languages. For the evaluation, we manually translate the wordsim-353 word pairs dataset from English into Yorùbá and Twi.
We extend the analysis to contextual word embeddings and evaluate multilingual BERT on a named entity recognition task. For this, we annotate with named entities the Global Voices corpus for Yorùbá.

@dadelani
Copy link

dadelani commented May 5, 2020

XTREME: A Massively Multilingual Multi-task Benchmark for Evaluating Cross-lingual Generalization

https://arxiv.org/pdf/2003.11080.pdf

Short description
Much recent progress in applications of machine learning models to NLP has been driven by benchmarks that evaluate models across a wide variety of tasks. However, these broad-coverage benchmarks have been mostly limited to English, and despite an increasing interest in multilingual models, a benchmark that enables the comprehensive evaluation of such methods on a diverse range of languages and tasks is still missing. To this end, we introduce the Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark, a multi-task benchmark for evaluating the cross-lingual generalization capabilities of multilingual representations across 40 languages (including Swahili & Yoruba) and 9 tasks.

@dadelani
Copy link

dadelani commented May 5, 2020

On the Cross-lingual Transferability of Monolingual Representations

https://arxiv.org/abs/1910.11856

Short description
State-of-the-art unsupervised multilingual models (e.g., multilingual BERT) have been shown to generalize in a zero-shot cross-lingual setting. This generalization ability has been attributed to the use of a shared subword vocabulary and joint training across multiple languages giving rise to deep multilingual abstractions. We evaluate this hypothesis by designing an alternative approach that transfers a monolingual model to new languages at the lexical level. More concretely, we first train a transformer-based masked language model on one language, and transfer it to a new language by learning a new embedding matrix with the same masked language modeling objective, freezing parameters of all other layers. This approach does not rely on a shared vocabulary or joint training. However, we show that it is competitive with multilingual BERT on standard cross-lingual classification benchmarks and on a new Cross-lingual Question Answering Dataset (XQuAD). Our results contradict common beliefs of the basis of the generalization ability of multilingual models and suggest that deep monolingual models learn some abstractions that generalize across languages. We also release XQuAD as a more comprehensive cross-lingual benchmark, which comprises 240 paragraphs and 1190 question-answer pairs from SQuAD v1.1 translated into ten languages by professional translators.

@hadyelsahar
Copy link

A Controllable Model of Grounded Response Generation

https://arxiv.org/pdf/2005.00613.pdf

Summary
Attempts to boost informativeness alone come at the expense of factual accuracy, as attested by GPT-2’s propensity to “hallucinate” facts. While this may be mitigated by access to background knowledge, there is scant guarantee of relevance and informativeness in generated responses.
We propose a framework that we call controllable grounded response generation (CGRG), in which lexical control phrases are either provided by an user or automatically extracted by a content planner from dialogue context and grounding knowledge.

@jaderabbit
Copy link
Member Author

MAD-X: An Adapter-based Framework for Multi-task Cross-lingual Transfer

https://arxiv.org/abs/2005.00052

Abstract
The main goal behind state-of-the-art pretrained multilingual models such as multilingual BERT and XLM-R is enabling and bootstrapping NLP applications in low-resource languages through zero-shot or few-shot cross-lingual transfer. However, due to limited model capacity, their transfer performance is the weakest exactly on such low-resource languages and languages unseen during pretraining. We propose MAD-X, an adapter-based framework that enables high portability and parameter-efficient transfer to arbitrary tasks and languages by learning modular language and task representations. In addition, we introduce a novel invertible adapter architecture and a strong baseline method for adapting a pretrained multilingual model to a new language. MAD-X outperforms the state of the art in cross-lingual transfer across a representative set of typologically diverse languages on named entity recognition and achieves competitive results on question answering.

@keleog
Copy link
Member

keleog commented May 14, 2020

mBART - Multilingual Denoising Pre-training for Neural Machine Translation
https://arxiv.org/abs/2001.08210

Abstract:
This paper demonstrates that multilingual denoising pre-training produces significant performance gains across a wide variety of machine translation (MT) tasks. We present mBART -- a sequence-to-sequence denoising auto-encoder pre-trained on large-scale monolingual corpora in many languages using the BART objective. mBART is one of the first methods for pre-training a complete sequence-to-sequence model by denoising full texts in multiple languages, while previous approaches have focused only on the encoder, decoder, or reconstructing parts of the text. Pre-training a complete model allows it to be directly fine tuned for supervised (both sentence-level and document-level) and unsupervised machine translation, with no task-specific modifications. We demonstrate that adding mBART initialization produces performance gains in all but the highest-resource settings, including up to 12 BLEU points for low resource MT and over 5 BLEU points for many document-level and unsupervised models. We also show it also enables new types of transfer to language pairs with no bi-text or that were not in the pre-training corpus, and present extensive analysis of which factors contribute the most to effective pre-training.

@keleog
Copy link
Member

keleog commented May 14, 2020

Cross-lingual Alignment vs Joint Training: A Comparative Study and A Simple Unified Framework

https://openreview.net/pdf?id=S1l-C0NtwS

Abstract
Abstract: Learning multilingual representations of text has proven a successful method for many cross-lingual transfer learning tasks. There are two main paradigms for learning such representations: (1) alignment, which maps different independently trained monolingual representations into a shared space, and (2) joint training, which directly learns unified multilingual representations using monolingual and cross-lingual objectives jointly. In this paper, we first conduct direct comparisons of representations learned using both of these methods across diverse cross-lingual tasks. Our empirical results reveal a set of pros and cons for both methods, and show that the relative performance of alignment versus joint training is task-dependent. Stemming from this analysis, we propose a simple and novel framework that combines these two previously mutually-exclusive approaches. Extensive experiments demonstrate that our proposed framework alleviates limitations of both approaches, and outperforms existing methods on the MUSE bilingual lexicon induction (BLI) benchmark. We further show that this framework can generalize to contextualized representations such as Multilingual BERT, and produces state-of-the-art results on the CoNLL cross-lingual NER benchmark.

@masakhane-io masakhane-io deleted a comment from jaderabbit May 23, 2020
@Jamiil92
Copy link

Word Translation Without Parallel Data

https://arxiv.org/pdf/1710.04087.pdf

Abstract:
State-of-the-art methods for learning cross-lingual word embeddings have relied on bilingual dictionaries or parallel corpora. Recent studies showed that the need for parallel data supervision can be alleviated with character-level information. While these methods showed encouraging results, they are not on par with their supervised counterparts and are limited to pairs of languages sharing a common alphabet. In this work, we show that we can build a bilingual dictionary between two languages without using any parallel corpora, by aligning monolingual word embedding spaces in an unsupervised way. Without using any character information, our model even outperforms existing supervised methods on cross-lingual tasks for some language pairs. Our experiments demonstrate that our method works very well also for distant language pairs, like English-Russian or English-Chinese. We finally describe experiments on the English-Esperanto low-resource language pair, on which there only exists a limited amount of parallel data, to show the potential impact of our method in fully unsupervised machine translation. Our code, embeddings and dictionaries are publicly available.

@hadyelsahar
Copy link

GPT-3 Language Models are Few-Shot Learners
https://arxiv.org/pdf/2005.14165.pdf

Are the computation costs worth it? I think this paper can raise interesting discussions beyond the hype.

Specifically, we train GPT-3, an autoregressive language model with 175 billion
parameters, 10x more than any previous non-sparse language model, and test its performance in
the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning,
with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3
achieves strong performance on many NLP datasets, including translation, question-answering, and
cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation

@masakhane-io masakhane-io deleted a comment from jaderabbit May 30, 2020
@keleog
Copy link
Member

keleog commented May 31, 2020

Unsupervised Domain Adaptation for Neural Machine Translation with Iterative Back Translation
Link - https://arxiv.org/abs/2001.08140

Why? - I feel like this represents an easy way to possibly generalize our niche religious MT models.

Abstract:
State-of-the-art neural machine translation (NMT) systems are data-hungry and perform poorly on do- mains with little supervised data. As data collection is expensive and infeasible in many cases, unsuper- vised domain adaptation methods are needed. We apply an Iterative Back Translation (IBT) train- ing scheme on in-domain monolingual data, which repeatedly uses a Transformer-based NMT model to create in-domain pseudo-parallel sentence pairs in one translation direction on the fly and then use them to train the model in the other direction. Evaluated on three domains of German-to-English translation task with no supervised data, this simple technique alone (without any out-of-domain parallel data) can already surpass all previous do- main adaptation methods—up to +9.48 BLEU over the strongest previous method, and up to +27.77 BLEU over the unadapted baseline. Moreover, given available supervised out-of-domain data on German-to-English and Romanian-to-English language pairs, we can further enhance the performance and obtain up to +19.31 BLEU improvement over the strongest baseline, and +47.69 BLEU increment against the unadapted model.

@poppingtonic
Copy link

poppingtonic commented Jun 1, 2020

Understanding Cross-Lingual Syntactic Transfer in Multilingual Recurrent Neural Networks

Link: https://arxiv.org/abs/2003.14056
Abstract:
It is now established that modern neural language models can be successfully trained on multiple languages simultaneously without changes to the underlying architecture, providing an easy way to adapt a variety of NLP models to low-resource languages. But what kind of knowledge is really shared among languages within these models? Does multilingual training mostly lead to an alignment of the lexical representation spaces or does it also enable the sharing of purely grammatical knowledge? In this paper we dissect different forms of cross-lingual transfer and look for its most determining factors, using a variety of models and probing tasks. We find that exposing our language models to a related language does not always increase grammatical knowledge in the target language, and that optimal conditions for lexical-semantic transfer may not be optimal for syntactic transfer.

@keleog
Copy link
Member

keleog commented Jun 2, 2020

Enhancing Machine Translation with Dependency-Aware Self-Attention

Link - https://arxiv.org/abs/1909.03149

Abstract:
Most neural machine translation models only rely on pairs of parallel sentences, assuming syntactic information is automatically learned by an attention mechanism. In this work, we investigate different approaches to incorporate syntactic knowledge in the Transformer model and also propose a novel, parameter-free, dependency-aware self-attention mechanism that improves its translation quality, especially for long sentences and in low-resource scenarios. We show the efficacy of each approach on WMT English↔German and English→Turkish, and WAT English→Japanese translation tasks.

@bduvenhage
Copy link

Learning Paraphrastic Sentence Embeddings from Back-Translated Bitext

Link - https://www.aclweb.org/anthology/D17-1026.pdf
github - https://github.com/jwieting/emnlp2017

Abstract:
We consider the problem of learning general-purpose, paraphrastic sentence embeddings in the setting of Wieting et al. (2016b). We use neural machine translation to generate sentential paraphrases via back-translation of bilingual sentence pairs. We evaluate the paraphrase pairs by their ability to serve as training data for learning paraphrastic sentence embeddings. We find that the data quality is stronger than prior work based on bitext and on par with manually-written English paraphrase pairs, with the advantage that our approach can scale up to generate large training sets for many languages and domains. We experiment with several language pairs and data sources, and develop a variety of data filtering techniques. In the process, we explore how neural machine translation output differs from human-written sentences, finding clear differences in length, the amount of repetition, and the use of rare words.

@jaderabbit
Copy link
Member Author

Balancing Training for Multilingual Neural Machine Translation

Abstract
When training multilingual machine translation (MT) models that can translate to/from multiple languages, we are faced with imbalanced training sets: some languages have much more training data than others. Standard practice is to up-sample less resourced languages to increase representation, and the degree of up-sampling has a large effect on the overall performance. In this paper, we propose a method that instead automatically learns how to weight training data through a data scorer that is optimized to maximize performance on all test languages. Experiments on two sets of languages under both one-to-many and many-to-one MT settings show our method not only consistently outperforms heuristic baselines in terms of average performance, but also offers flexible control over the performance of which languages are optimized.

Value
Because multilingual methods are so sensitive to sampling, I think an approach like this would be amazing

https://arxiv.org/abs/2004.06748

@masakhane-io masakhane-io deleted a comment from dadelani Jun 6, 2020
@hadyelsahar
Copy link

What Kind of Language Is Hard to Language-Model? ACL19
https://arxiv.org/pdf/1906.04726.pdf

Are there some types of
language that are easier to model with current methods? In prior work (Cotterell et al.,
2018) we attempted to address this question
for language modeling, and observed that recurrent neural network language models do
not perform equally well over all the highresource European languages found in the
Europarl corpus. We speculated that inflectional morphology may be the primary culprit for the discrepancy. In this paper, we extend these earlier experiments to cover 69 languages from 13 language families using a multilingual Bible corpus.

@keleog
Copy link
Member

keleog commented Jun 13, 2020

Transferring Inductive Biases through Knowledge Distillation

Having the right inductive biases can be crucial in many tasks or scenarios where
data or computing resources are a limiting factor, or where training data is not
perfectly representative of the conditions at test time. However, defining, designing
and efficiently adapting inductive biases is not necessarily straightforward. In
this paper, we explore the power of knowledge distillation for transferring the
effect of inductive biases from one model to another. We consider families of
models with different inductive biases, LSTMs vs. Transformers and CNNs vs.
MLPs, in the context of tasks and scenarios where having the right inductive biases
is critical. We study how the effect of inductive biases is transferred through
knowledge distillation, in terms of not only performance but also different aspects
of converged solutions

@hadyelsahar
Copy link

Biases in of Pretrained Language models

The Woman Worked as a Babysitter: On Biases in Language Generation EMNLP2019
https://www.aclweb.org/anthology/D19-1339.pdf

StereoSet: Measuring stereotypical bias in pretrained language models
https://arxiv.org/pdf/2004.09456.pdf
and a recent competition: https://stereoset.mit.edu/

@masakhane-io masakhane-io deleted a comment from elyesmanai Jun 27, 2020
@dnzengou
Copy link

dnzengou commented Jun 28, 2020 via email

@chrisemezue
Copy link
Member

Predicting Performance for Natural Language Processing Tasks

Link: https://www.aclweb.org/anthology/2020.acl-main.764.pdf

Abstract:
Given the complexity of combinations of tasks, languages, and domains in natural language processing (NLP) research, it is computationally prohibitive to exhaustively test newly proposed models on each possible experimental setting. In this work, we attempt to explore the possibility of gaining plausible judgments of how well an NLP model can perform under an experimental setting, without actually training or testing the model. To do so, we build regression models to predict the evaluation score of an NLP experiment given the experimental settings as input. Experimenting on 9 different NLP tasks, we find that our predictors can produce meaningful predictions over unseen languages and different modeling architectures, outperforming reasonable baselines as well as human experts. Going further, we outline how our predictor can be used to find a small subset of representative experiments that should be run in order to obtain plausible predictions for all other experimental settings.

@hadyelsahar
Copy link

Decolonial AI: Decolonial Theory as Sociotechnical Foresight in Artificial Intelligence #11
https://arxiv.org/pdf/2007.04068.pdf
proposed by @elevsev

Summary:
This paper explores the important role of critical science, and in particular of post-colonial and decolonial theories, in understanding and shaping the ongoing advances in artificial intelligence. Artificial Intelligence (AI) is viewed as amongst the technological advances that will reshape modern societies and their relations. Whilst the design and deployment of systems that continually adapt holds the promise of far-reaching positive change, they simultaneously pose significant risks, especially to already vulnerable peoples. Values and power are central to this discussion. Decolonial theories use historical hindsight to explain patterns of power that shape our intellectual, political, economic, and social world. By embedding a decolonial critical approach within its technical practice, AI communities can develop foresight and tactics that can better align research and technology development with established ethical principles, centring vulnerable peoples who continue to bear the brunt of negative impacts of innovation and scientific progress. We highlight problematic applications that are instances of coloniality, and using a decolonial lens, submit three tactics that can form a decolonial field of artificial intelligence: creating a critical technical practice of AI, seeking reverse tutelage and reverse pedagogies, and the renewal of affective and political communities. The years ahead will usher in a wave of new scientific breakthroughs and technologies driven by AI research, making it incumbent upon AI communities to strengthen the social contract through ethical foresight and the multiplicity of intellectual perspectives available to us; ultimately supporting future technologies that enable greater well-being, with the goal of beneficence and justice for all.

@orevaahia
Copy link

Towards Ecologically Valid Research on Language User Interfaces

Link: https://arxiv.org/pdf/2007.14435.pdf

Abstract:
Language User Interfaces (LUIs) could improve human-machine interaction for a wide variety of tasks, such as playing music, getting insights from databases, or instructing domestic robots. In contrast to traditional hand-crafted approaches, recent work attempts to build LUIs in a data-driven way using modern deep learning methods. To satisfy the data needs of such learning algorithms, researchers have constructed benchmarks that emphasize the quantity of collected data at the cost of its naturalness and relevance to real-world LUI use cases. As a consequence, research findings on such benchmarks might not be relevant for developing practical LUIs. The goal of this paper is to bootstrap the discussion around this issue, which we refer to as the benchmarks’ low ecological validity. To this end, we describe what we deem an ideal methodology for machine learning research on LUIs and categorize five common ways in which recent benchmarks deviate from it. We give concrete examples of the five kinds of deviations and their consequences. Lastly, we offer a number of recommendations as to how to increase the ecological validity of machine learning research on LUIs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests