What is BigBird โข How to Use โข Pretraining โข Evaluation Result โข Docs โข Citation
ํ๊ตญ์ด | English
BigBird: Transformers for Longer Sequences์์ ์๊ฐ๋ sparse-attention ๊ธฐ๋ฐ์ ๋ชจ๋ธ๋ก, ์ผ๋ฐ์ ์ธ BERT๋ณด๋ค ๋ ๊ธด sequence๋ฅผ ๋ค๋ฃฐ ์ ์์ต๋๋ค.
๐ฆ Longer Sequence - ์ต๋ 512๊ฐ์ token์ ๋ค๋ฃฐ ์ ์๋ BERT์ 8๋ฐฐ์ธ ์ต๋ 4096๊ฐ์ token์ ๋ค๋ฃธ
โฑ๏ธ Computational Efficiency - Full attention์ด ์๋ Sparse Attention์ ์ด์ฉํ์ฌ O(n2)์์ O(n)์ผ๋ก ๊ฐ์
- ๐ค Huggingface Hub์ ์ ๋ก๋๋ ๋ชจ๋ธ์ ๊ณง๋ฐ๋ก ์ฌ์ฉํ ์ ์์ต๋๋ค:)
- ์ผ๋ถ ์ด์๊ฐ ํด๊ฒฐ๋
transformers>=4.11.0
์ฌ์ฉ์ ๊ถ์ฅํฉ๋๋ค. (MRC ์ด์ ๊ด๋ จ PR) - BigBirdTokenizer ๋์ ์
BertTokenizer
๋ฅผ ์ฌ์ฉํด์ผ ํฉ๋๋ค. (AutoTokenizer
์ฌ์ฉ์BertTokenizer
๊ฐ ๋ก๋๋ฉ๋๋ค.) - ์์ธํ ์ฌ์ฉ๋ฒ์ BigBird Tranformers documentation์ ์ฐธ๊ณ ํด์ฃผ์ธ์.
from transformers import AutoModel, AutoTokenizer
model = AutoModel.from_pretrained("monologg/kobigbird-bert-base") # BigBirdModel
tokenizer = AutoTokenizer.from_pretrained("monologg/kobigbird-bert-base") # BertTokenizer
์์ธํ ๋ด์ฉ์ [Pretraining BigBird] ์ฐธ๊ณ
Hardware | Max len | LR | Batch | Train Step | Warmup Step | |
---|---|---|---|---|---|---|
KoBigBird-BERT-Base | TPU v3-8 | 4096 | 1e-4 | 32 | 2M | 20k |
- ๋ชจ๋์ ๋ง๋ญ์น, ํ๊ตญ์ด ์ํค, Common Crawl, ๋ด์ค ๋ฐ์ดํฐ ๋ฑ ๋ค์ํ ๋ฐ์ดํฐ๋ก ํ์ต
ITC (Internal Transformer Construction)
๋ชจ๋ธ๋ก ํ์ต (ITC vs ETC)
์์ธํ ๋ด์ฉ์ [Finetune on Short Sequence Dataset] ์ฐธ๊ณ
NSMC (acc) |
KLUE-NLI (acc) |
KLUE-STS (pearsonr) |
Korquad 1.0 (em/f1) |
KLUE MRC (em/rouge-w) |
|
---|---|---|---|---|---|
KoELECTRA-Base-v3 | 91.13 | 86.87 | 93.14 | 85.66 / 93.94 | 59.54 / 65.64 |
KLUE-RoBERTa-Base | 91.16 | 86.30 | 92.91 | 85.35 / 94.53 | 69.56 / 74.64 |
KoBigBird-BERT-Base | 91.18 | 87.17 | 92.61 | 87.08 / 94.71 | 70.33 / 75.34 |
์์ธํ ๋ด์ฉ์ [Finetune on Long Sequence Dataset] ์ฐธ๊ณ
TyDi QA (em/f1) |
Korquad 2.1 (em/f1) |
Fake News (f1) |
Modu Sentiment (f1-macro) |
|
---|---|---|---|---|
KLUE-RoBERTa-Base | 76.80 / 78.58 | 55.44 / 73.02 | 95.20 | 42.61 |
KoBigBird-BERT-Base | 79.13 / 81.30 | 67.77 / 82.03 | 98.85 | 45.42 |
- Pretraining BigBird
- Finetune on Short Sequence Dataset
- Finetune on Long Sequence Dataset
- Download Tensorflow v1 checkpoint
- GPU Benchmark result
KoBigBird๋ฅผ ์ฌ์ฉํ์ ๋ค๋ฉด ์๋์ ๊ฐ์ด ์ธ์ฉํด์ฃผ์ธ์.
@software{jangwon_park_2021_5654154,
author = {Jangwon Park and Donggyu Kim},
title = {KoBigBird: Pretrained BigBird Model for Korean},
month = nov,
year = 2021,
publisher = {Zenodo},
version = {1.0.0},
doi = {10.5281/zenodo.5654154},
url = {https://doi.org/10.5281/zenodo.5654154}
}
KoBigBird๋ Tensorflow Research Cloud (TFRC) ํ๋ก๊ทธ๋จ์ Cloud TPU ์ง์์ผ๋ก ์ ์๋์์ต๋๋ค.
๋ํ ๋ฉ์ง ๋ก๊ณ ๋ฅผ ์ ๊ณตํด์ฃผ์ Seyun Ahn๋๊ป ๊ฐ์ฌ๋ฅผ ์ ํฉ๋๋ค.