Find and upload Persian News Corpus #1

sehsanm · 2018-12-02T13:20:35Z

Find the Persian news Corpus
Define a corpus file standard. (To be discussed with other Corpus builders) - Most probably one sentence in each line
Upload the zipped version of the corpus in S3 bucket (Contact @sehsanm to get the access details)

abb4s · 2018-12-03T04:29:09Z

recently I found "sketchengine" as a tool for making corpus from web. it is explained how to use it in this tutorial : https://www.sketchengine.eu/quick-start-guide/create-your-corpus-lesson-4/ . I have maked sample corpus from Hamshahri news by this tool which I'll attach it here.
ham2_2(1).txt

and also we can just find corpus like "Hamshahri" or "irBlog".

sehsanm · 2018-12-03T04:54:03Z

Nice tool. I know that someone already has collected the Persian news corpus in our NLP lab. Also note that we should be seeking more than 10 milion sentences. If this tool is able to crawl all of that. Lets build a fresh copy.

abb4s · 2018-12-03T06:01:17Z

it has 1,000,000 words limitation .

sehsanm · 2018-12-04T13:00:44Z

If you are starting to work on this please move it to in progress

sehsanm · 2018-12-19T14:07:41Z

Any progress in this @PoriNiki ?

FullDataAlchemist · 2018-12-19T14:34:31Z

yes. I trying to get the corpus from "Kanal e Khabar".

FullDataAlchemist · 2018-12-19T14:39:20Z

I'm recently talking with them but, this could take time.

sehsanm · 2018-12-27T06:39:41Z

سلام
تا انجام شدن کامل این تسک فقط یک
Readme.md
فاصله داریم

FullDataAlchemist · 2018-12-27T07:39:17Z

سلام. از طرف من فعلا دیتای مذکور کنسل شد من از این مورد کنار میرم.

sehsanm added the CORPUS label Dec 2, 2018

sehsanm added this to the Assignment milestone Dec 2, 2018

sehsanm mentioned this issue Dec 3, 2018

Find and Upload Persian Weblog Corpus #2

Closed

sehsanm mentioned this issue Dec 3, 2018

Find, Upload and Cleanse Persian Wiki Dump #4

Open

zahramajd self-assigned this Dec 4, 2018

sehsanm mentioned this issue Dec 4, 2018

Use gensim to train a CBOW model #23

Open

sehsanm assigned zahramajd and unassigned zahramajd Dec 4, 2018

zahramajd removed their assignment Dec 4, 2018

maryambiabani self-assigned this Dec 4, 2018

FullDataAlchemist self-assigned this Dec 8, 2018

FullDataAlchemist removed their assignment Dec 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Find and upload Persian News Corpus #1

Find and upload Persian News Corpus #1

sehsanm commented Dec 2, 2018 •

edited

Loading

abb4s commented Dec 3, 2018

sehsanm commented Dec 3, 2018

abb4s commented Dec 3, 2018 •

edited

Loading

sehsanm commented Dec 4, 2018

sehsanm commented Dec 19, 2018

FullDataAlchemist commented Dec 19, 2018 •

edited

Loading

FullDataAlchemist commented Dec 19, 2018

sehsanm commented Dec 27, 2018

FullDataAlchemist commented Dec 27, 2018

Find and upload Persian News Corpus #1

Find and upload Persian News Corpus #1

Comments

sehsanm commented Dec 2, 2018 • edited Loading

abb4s commented Dec 3, 2018

sehsanm commented Dec 3, 2018

abb4s commented Dec 3, 2018 • edited Loading

sehsanm commented Dec 4, 2018

sehsanm commented Dec 19, 2018

FullDataAlchemist commented Dec 19, 2018 • edited Loading

FullDataAlchemist commented Dec 19, 2018

sehsanm commented Dec 27, 2018

FullDataAlchemist commented Dec 27, 2018

sehsanm commented Dec 2, 2018 •

edited

Loading

abb4s commented Dec 3, 2018 •

edited

Loading

FullDataAlchemist commented Dec 19, 2018 •

edited

Loading