Retrieval Augmented Causal Generation #45

ClashLuke · 2022-05-17T09:44:20Z

DeepMind demonstrated in their recent RETRO paper that augmenting a language model's input with text retrieved from a corpus allows it to learn to copy relevant passages instead of storing those in its weights. This text retrieval is another solution to the problem mentioned in #8 and doesn't involve modifying the model. Instead, RETRO first retrieves similar text using BERT embeddings and then feeds that text into the cross-attention of their model together with the original prompt. This way, the decoder of their T5-model is aware of similar texts without storing them in its weights.
We could implement a similar architecture without cross attention (#44) by using only autoregressive language modelling and retrieving chunks using BERT (or our own) embeddings. It would even be possible to test this approach without retraining a model by simply retrieving relevant chunks and feeding them into the context of our model (instead of using padding tokens).
This issue tracks the progress of the initial proof-of-concept, its benchmarks against the baseline and its overall progress.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retrieval Augmented Causal Generation #45

Retrieval Augmented Causal Generation #45

ClashLuke commented May 17, 2022

Retrieval Augmented Causal Generation #45

Retrieval Augmented Causal Generation #45

Comments

ClashLuke commented May 17, 2022