Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEATURE] corpus를 pre-processing하는 script를 개발한다. #3

Open
seopbo opened this issue Sep 12, 2020 · 1 comment
Open

[FEATURE] corpus를 pre-processing하는 script를 개발한다. #3

seopbo opened this issue Sep 12, 2020 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@seopbo
Copy link
Contributor

seopbo commented Sep 12, 2020

🚀 Feature

BART의 pre-training을 위해 corpus를 pre-processing하는 script를 개발한다.

  • 사전에 corruption을 해서 integer 형태로 떨궈놓기
  • tokenizers package를 이용, byte pair encoding하기

Motivation

Pitch

Additional context

@seopbo seopbo added the enhancement New feature or request label Sep 12, 2020
@seopbo seopbo self-assigned this Sep 12, 2020
seopbo added a commit that referenced this issue Sep 12, 2020
@seopbo
Copy link
Contributor Author

seopbo commented Sep 27, 2020

seopbo added a commit that referenced this issue Sep 28, 2020
seopbo added a commit that referenced this issue Sep 29, 2020
seopbo added a commit that referenced this issue Sep 29, 2020
seopbo added a commit that referenced this issue Sep 29, 2020
@seopbo seopbo mentioned this issue Sep 29, 2020
2 tasks
seopbo added a commit that referenced this issue Sep 30, 2020
chagmgang added a commit that referenced this issue Sep 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant