Home

Temporal Latent Bottleneck

The goal of this work is to design an architecture for autoregressive modelling that has an induction bias towards learning tempoorally compressed representations that retains the benefits of Transformers while preserving long-range interactions.

Perceptual Module (Fast Stream)

The fast stream has a short term memory with a high capacity that reacts quickly to sensory input. This is modelled with Transformers.

Temporal Latent Bottleneck (Slow Stream)

The slow stream has a long term memory which updates at a slower rate and summarizes the most important information in the input sequence.

Implementation

Divide the input into fixed size chunks.
Fast stream operates within each chunk.
Slow stream consolidates and aggregates information across chunks.

Information Asymmetry

The fast and slow stream induce information asymmetry.

Fast Stream	Slow Stream
fine grained	coarse grained
local information	distant information

The fast and slow stream interact with each other through bottleneck of attention.

Methodology

Given an input sequence $X = [x_{0}, x_{1}, \dots, x_{T}]$, it is divided into chunks of fixed size $K$. Each chunk is referred to as $X_{l}$, where $l = 0, 1, \dots \lfloor\frac{T}{K}\rfloor$

$$ X \to \{ x_{1}, \dots, x_{\frac{T}{K}} \} $$

Each chunk is processed by a perceptual module $\mathcal{F}$ (fast stream). Note: While processing, the perceptual module is also conditioned on the information form the temporal latent bottleneck $\mathcal{G}$ (slow stream).

$$ \bar{X_{l}} = \mathcal{F}(X_{l}, \mathcal{I}_{l}) $$

The temporal latent bottleneck is recurrent in nature and has a hidden state of its own $\mathcal{I}$ which is set of $N$ vectors.

$$ \bar{X}{l+1} = \mathcal{G}(\bar{X}{l}, \mathcal{I}_{l}) $$

Provide feedback

Saved searches

Use saved searches to filter your results more quickly