Pre-trained models are available here. For data privacy reasons (models were pre-trained with MLM task on private data), classification heads are removed from the models but the encoder remains.
LayoutLM1 was pre-trained in 3 flavours with a maximum sequence length of 518 tokens The flavours differentiate by the 2D relative attention bias applied to the input. These versions are referred to as SplitPage in the paper.
Linformer2 was pre-trained with 2048 sequence length.
Cosformer3 was pre-trained in 3 flavours with a maximum sequence length of 2048 tokens. The 2D relative attention biases are similar to those used for layoutLM but not exactly identical.
Cosformer is not compatible with fp16 inference or training. More investigation is needed to evaluate its compatibility with bf16.
Example comming soon :-)
@incollection{Douzon_2023,
doi = {10.1007/978-3-031-41501-2_4},
url = {https://doi.org/10.1007%2F978-3-031-41501-2_4},
year = 2023,
publisher = {Springer Nature Switzerland},
pages = {47--64},
author = {Thibault Douzon and Stefan Duffner and Christophe Garcia and J{\'{e}}r{\'{e}}my Espinas},
title = {Long-Range Transformer Architectures for~Document Understanding},
booktitle = {Document Analysis and Recognition {\textendash} {ICDAR} 2023 Workshops}
}