Feat: Implement Multi-Language Support for Scraping #1138

Ayushi-Choudhary22 · 2024-07-25T02:45:59Z

Describe the feature

Enhance the scraper to support multiple languages, enabling it to scrape content from non-English websites effectively. This will involve:

*Detecting the language of the website.
*Using appropriate libraries and methods to handle different character encodings.
*Adding translations for common scraping elements and error messages.

Add ScreenShots

Web Scraper

Overview

This project is a web scraper designed to extract and process data from websites. It is currently tested on English websites and is being enhanced to handle multi-language content seamlessly.

Current Scraper Output

English Site

Description: Screenshot showing the scraper working perfectly with an English website.

The scraper successfully extracts and displays data from English sites without any issues.

Potential Issues with Non-English Sites

Text Encoding or Parsing Issues

Description: Screenshot displaying the scraper encountering issues with text encoding or parsing on a non-English website.

The scraper may face challenges when dealing with non-English text, leading to errors in text encoding or parsing.

Expected Output with Multi-Language Support

Mock-Up of Expected Results

Description: Mock-up of the expected output where the scraper handles multiple languages seamlessly.

The scraper is expected to handle various languages correctly, with accurate data extraction and display.

Features

Language Support: Designed to work with multiple languages.
Error Handling: Includes mechanisms to manage text encoding and parsing issues.
Flexibility: Capable of adapting to different website structures and formats.

Record

I agree to follow this project's Code of Conduct
I'm a GSSoC'24 contributor
I want to work on this issue

github-actions · 2024-07-25T02:46:24Z

Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.

Ayushi-Choudhary22 · 2024-07-25T02:55:45Z

hey @nikhil25803 I’m keen to contribute to the scraper project for GSSoC 2024. I’m interested in tackling this issue #1138
Can you assign this to me? Looking forward to getting started!
Thanks!

nikhil25803 closed this as completed Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Implement Multi-Language Support for Scraping #1138

Feat: Implement Multi-Language Support for Scraping #1138

Ayushi-Choudhary22 commented Jul 25, 2024

github-actions bot commented Jul 25, 2024

Ayushi-Choudhary22 commented Jul 25, 2024

Feat: Implement Multi-Language Support for Scraping #1138

Feat: Implement Multi-Language Support for Scraping #1138

Comments

Ayushi-Choudhary22 commented Jul 25, 2024

Describe the feature

Add ScreenShots

Web Scraper

Overview

Current Scraper Output

English Site

Potential Issues with Non-English Sites

Text Encoding or Parsing Issues

Expected Output with Multi-Language Support

Mock-Up of Expected Results

Features

Record

github-actions bot commented Jul 25, 2024

Ayushi-Choudhary22 commented Jul 25, 2024