Urdu Poetry Generation

Project Description:

This project aims to generate Urdu poetry using N-Grams based on a scrapped poetry dataset. The dataset contains a collection of Urdu poems from various Urdu Poets, serving as the foundation for training and generating new poetic verses. The project will explore the use of the following different N-Gram models:

Unigram
Bigram
Trigram
Backward Bigram
Bi-directional Bigram

Data Collection:

The dataset has been scrapped from the following website : https://www.rekhta.org/ . All poems of atleast 25 Urdu Poets have been scrapped from this website. Scrappy has been used to achieve the web-scrapping task . Following is the link for Scrappy : https://scrapy.org/ The scrapped data has been saved in the scrapped poems.csv. A spider has been used for this purpose . It is avaliable in the I212705urdupoemsspider.py file.

Scrapped_Poems.csv:

The csv file contains the following 3 columns :

Poem Line : This is the verse in a particular poem.
Nazm Name : This is the name of the poem.
Author Name : This is the name of the author of the poem.

I212705urdupoemsspider.py:

This is the spider that has been used to scrap the poems . Following are the steps to use the spider:

Create virtual environment in Visual Studio Code.
Install scrappy.
Create a scrappy project inside the virtual environment.
Either Copy Paste my spider in your spider or add my spider into your project in the spiders directory.
run the spider.
data will automatically be stored in a new file called scrapped_poems.csv in the same directory.
The spider code is modifiable . You can change as per your requirement.
This code can also serve as a base to scrap other websites but Knowledge of scrappy is a must.

Conclusion:

Through this project, I aim to showcase the versatility and creativity of N-Gram models in generating Urdu poetry while preserving the aesthetic and linguistic richness of the language. A pdf report is also available of the project .

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
I212705urdupoemsspider.py		I212705urdupoemsspider.py
README.md		README.md
Urdu_Poetry_Genration.ipynb		Urdu_Poetry_Genration.ipynb
i21-2705_NLP.pdf		i21-2705_NLP.pdf
scrapped_poems.csv		scrapped_poems.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Urdu Poetry Generation

Project Description:

Data Collection:

Scrapped_Poems.csv:

I212705urdupoemsspider.py:

Conclusion:

About

Releases

Packages

Languages

Zaraar125/Urdu-Poetry-Generation

Folders and files

Latest commit

History

Repository files navigation

Urdu Poetry Generation

Project Description:

Data Collection:

Scrapped_Poems.csv:

I212705urdupoemsspider.py:

Conclusion:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages