Web Scraping Times of India (TOI)

This project demonstrates how to scrape articles from the Times of India (TOI) website using Python. It extracts relevant news content, such as headlines, article bodies, publication dates, and authors, and stores the data in a structured format for further analysis. The project also utilizes Transformers for summarizing articles and spaCy for data filtration.

Project Overview

This notebook aims to scrape and analyze articles from TOI. It includes steps to:

Connect to the TOI website.
Extract headlines, article texts, and metadata.
Use spaCy for data filtration and processing of the extracted text.
Utilize Hugging Face Transformers to summarize articles.
Store the extracted data in a structured format for later use in data analysis or sentiment analysis.

Installation

To run the code, clone this repository and install the required dependencies:

git clone <repository-url>
cd <repository-directory>
pip install -r requirements.txt

Usage

Open the Jupyter notebook file Scrapping_TOI.ipynb.
Run the cells in sequence to scrape articles from TOI.
Modify the base URL, if needed, to target different sections of TOI.
The extracted data will be filtered using spaCy and summarized using Transformers.
The final data will be displayed or saved based on the configuration in the notebook.

Example Usage

# Example of running the notebook
python scrape_toi.py

Requirements

Python 3.x
Jupyter Notebook
Libraries: requests, BeautifulSoup, pandas, time, re, transformers, spacy

You can install all the necessary libraries using the following command:

pip install -r requirements.txt

Project Structure

├── Scrapping_TOI.ipynb  # Main notebook for scraping TOI articles
├── requirements.txt     # Required Python libraries
└── README.md            # Project documentation

Contributing

Feel free to submit a pull request or create an issue if you have suggestions for improving the project.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
Model_Deployed.ipynb		Model_Deployed.ipynb
README.md		README.md
Scrapping_TOI.ipynb		Scrapping_TOI.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Scraping Times of India (TOI)

Table of Contents

Project Overview

Installation

Usage

Example Usage

Requirements

Project Structure

Contributing

License

About

Releases

Packages

Languages

Khushangz/Latest_News_Summarization

Folders and files

Latest commit

History

Repository files navigation

Web Scraping Times of India (TOI)

Table of Contents

Project Overview

Installation

Usage

Example Usage

Requirements

Project Structure

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages