-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Aritra Roy Gosthipaty edited this page Feb 22, 2023
·
7 revisions
The goal of this work is to design an architecture for autoregressive modelling that has an induction bias towards learning tempoorally compressed representations that retains the benefits of Transformers while preserving long-range interactions.
The fast stream has a short term memory with a high capacity that reacts quickly to sensory input. This is modelled with Transformers.
The slow stream has a long term memory which updates at a slower rate and summarizes the most important information in the input sequence.
- Divide the input into fixed size chunks.
- Fast stream operates within each chunk.
- Slow stream consolidates and aggregates information across chunks.