Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vocab 생성관련 mecab pretokenize 방식 #43

Open
soonilbae opened this issue Jan 16, 2023 · 0 comments
Open

vocab 생성관련 mecab pretokenize 방식 #43

soonilbae opened this issue Jan 16, 2023 · 0 comments
Labels
question Further information is requested

Comments

@soonilbae
Copy link

vocab 생성관련 closed된 issue들 확인했습니다. 답변하신대로 KLUE Paper 4.1부분을
확인하였는데, Mecab으로 pretokenize한다는 것이 전체 코퍼스를 단순히 형태소 단위로만
분리하여 wordpiece로 학습한다는 의미인가요?

논문의 예제를 보니 단순히 형태소 단위로 분리하는 것 이상의 전처리가 필요한
것이 아닌가하는 생각이 들어서요. 초보자이다 보니 너무 간단한 질문을 하게
되는 것 같습니다. 감사합니다.

@soonilbae soonilbae added the question Further information is requested label Jan 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

1 participant