-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Find and upload Persian News Corpus #1
Comments
recently I found "sketchengine" as a tool for making corpus from web. it is explained how to use it in this tutorial : https://www.sketchengine.eu/quick-start-guide/create-your-corpus-lesson-4/ . I have maked sample corpus from Hamshahri news by this tool which I'll attach it here. and also we can just find corpus like "Hamshahri" or "irBlog". |
Nice tool. I know that someone already has collected the Persian news corpus in our NLP lab. Also note that we should be seeking more than 10 milion sentences. If this tool is able to crawl all of that. Lets build a fresh copy. |
it has 1,000,000 words limitation . |
If you are starting to work on this please move it to in progress |
Any progress in this @PoriNiki ? |
yes. I trying to get the corpus from "Kanal e Khabar". |
I'm recently talking with them but, this could take time. |
سلام |
سلام. از طرف من فعلا دیتای مذکور کنسل شد من از این مورد کنار میرم. |
The text was updated successfully, but these errors were encountered: