A refined Conformer block with Rotary Position Embedding, modified from lucidrains' implement
-
Use Rotary Position Embedding instead of relative embedding
-
Use pytorch official implement of GLU and Swish activation, which are slightly faster
-
Use pytorch official implement of scaled_dot_product_attention, which can automatically switch to flash attention or xformers if possible
-
Remove the the dependency of einops, now we only need pytorch
import torch
from conformer import Conformer
model = Conformer(n_layers=3,
hidden_channels=192,
filter_channels=768,
n_heads=2,
kernel_size=3)
# input tensor
x = torch.randn([32, 192, 35]) # input shape: [batch_size, hidden_channels, time]
# a float mask for the input tensor, where zero indicates padding
x_mask = torch.ones([32, 1, 35]) # input shape: [batch_size, 1, time]
model(x, x_mask) # (32, 192, 35)