Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawling 데이터 후처리 이슈 #26

Open
WagyuShark opened this issue Nov 11, 2024 · 0 comments
Open

Crawling 데이터 후처리 이슈 #26

WagyuShark opened this issue Nov 11, 2024 · 0 comments
Assignees

Comments

@WagyuShark
Copy link
Contributor

WagyuShark commented Nov 11, 2024

  • 현재 토큰수가 너무 많음
  • 데이터에 불필요한 부분이 너무 많은데, 이를 줄일 후처리가 적용되어있지 않음
  • 특정 공공기관 사이트는 hwp, pdf 문서형태로 만들고 그 문서를 사이트에 읽을 수 있도록 띄워두는 형태가 존재
  • 위 경우엔 파일을 받아서 전처리해야함
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants