Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
__init__.py		__init__.py
attention.py		attention.py
attention_test.py		attention_test.py
cls_head.py		cls_head.py
cls_head_test.py		cls_head_test.py
dense_einsum.py		dense_einsum.py
dense_einsum_test.py		dense_einsum_test.py
gated_feedforward.py		gated_feedforward.py
gated_feedforward_test.py		gated_feedforward_test.py
masked_lm.py		masked_lm.py
masked_lm_test.py		masked_lm_test.py
masked_softmax.py		masked_softmax.py
masked_softmax_test.py		masked_softmax_test.py
mat_mul_with_margin.py		mat_mul_with_margin.py
mat_mul_with_margin_test.py		mat_mul_with_margin_test.py
multi_channel_attention.py		multi_channel_attention.py
multi_channel_attention_test.py		multi_channel_attention_test.py
on_device_embedding.py		on_device_embedding.py
on_device_embedding_test.py		on_device_embedding_test.py
position_embedding.py		position_embedding.py
position_embedding_test.py		position_embedding_test.py
rezero_transformer.py		rezero_transformer.py
rezero_transformer_test.py		rezero_transformer_test.py
self_attention_mask.py		self_attention_mask.py
talking_heads_attention.py		talking_heads_attention.py
talking_heads_attention_test.py		talking_heads_attention_test.py
transformer.py		transformer.py
transformer_scaffold.py		transformer_scaffold.py
transformer_scaffold_test.py		transformer_scaffold_test.py
transformer_test.py		transformer_test.py
util.py		util.py

README.md

Layers

Layers are the fundamental building blocks for NLP models. They can be used to assemble new layers, networks, or models.

MultiHeadAttention implements an optionally masked attention between query, key, value tensors as described in "Attention Is All You Need". If from_tensor and to_tensor are the same, then this is self-attention.
CachedAttention implements an attention layer with cache used for auto-agressive decoding.
MatMulWithMargin implements a matrix multiplication with margin layer used for training retrieval / ranking tasks, as described in "Improving Multilingual Sentence Embedding using Bi-directional Dual Encoder with Additive Margin Softmax".
MultiChannelAttention implements an variant of multi-head attention which can be used to merge multiple streams for cross-attentions.
TalkingHeadsAttention implements the talking heads attention, as decribed in "Talking-Heads Attention".
Transformer implements an optionally masked transformer as described in "Attention Is All You Need".
TransformerDecoderLayer TransformerDecoderLayer is made up of self multi-head attention, cross multi-head attention and feedforward network.
ReZeroTransformer implements Transformer with ReZero described in "ReZero is All You Need: Fast Convergence at Large Depth".
OnDeviceEmbedding implements efficient embedding lookups designed for TPU-based models.
PositionalEmbedding creates a positional embedding as described in "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding".
SelfAttentionMask creates a 3D attention mask from a 2D tensor mask.
MaskedSoftmax implements a softmax with an optional masking input. If no mask is provided to this layer, it performs a standard softmax; however, if a mask tensor is applied (which should be 1 in positions where the data should be allowed through, and 0 where the data should be masked), the output will have masked positions set to approximately zero.
MaskedLM implements a masked language model. It assumes the embedding table variable is passed to it.
ClassificationHead A pooling head over a sequence of embeddings, commonly used by classification tasks.
GatedFeedforward implements the gated linear layer feedforward as described in "GLU Variants Improve Transformer".

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

layers

layers

README.md

Layers

Files

layers

Directory actions

More options

Directory actions

More options

Latest commit

History

layers

Folders and files

parent directory

README.md

Layers