Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat: Implement Multi-Language Support for Scraping #1138

Closed
3 tasks done
Ayushi-Choudhary22 opened this issue Jul 25, 2024 · 2 comments
Closed
3 tasks done

Feat: Implement Multi-Language Support for Scraping #1138

Ayushi-Choudhary22 opened this issue Jul 25, 2024 · 2 comments

Comments

@Ayushi-Choudhary22
Copy link

Describe the feature

Enhance the scraper to support multiple languages, enabling it to scrape content from non-English websites effectively. This will involve:

*Detecting the language of the website.
*Using appropriate libraries and methods to handle different character encodings.
*Adding translations for common scraping elements and error messages.

Add ScreenShots

W2MDyeo0zkf
WhatsApp Image 2024-07-25 at 7 55 49 AM
WhatsApp Image 2024-07-25 at 7 59 02 AM
WhatsApp Image 2024-07-25 at 7 56 24 AM

Web Scraper

Overview

This project is a web scraper designed to extract and process data from websites. It is currently tested on English websites and is being enhanced to handle multi-language content seamlessly.

Current Scraper Output

English Site

Description: Screenshot showing the scraper working perfectly with an English website.

  • The scraper successfully extracts and displays data from English sites without any issues.

Potential Issues with Non-English Sites

Text Encoding or Parsing Issues

Description: Screenshot displaying the scraper encountering issues with text encoding or parsing on a non-English website.

  • The scraper may face challenges when dealing with non-English text, leading to errors in text encoding or parsing.

Expected Output with Multi-Language Support

Mock-Up of Expected Results

Description: Mock-up of the expected output where the scraper handles multiple languages seamlessly.

  • The scraper is expected to handle various languages correctly, with accurate data extraction and display.

Features

  • Language Support: Designed to work with multiple languages.
  • Error Handling: Includes mechanisms to manage text encoding and parsing issues.
  • Flexibility: Capable of adapting to different website structures and formats.

Record

  • I agree to follow this project's Code of Conduct
  • I'm a GSSoC'24 contributor
  • I want to work on this issue
Copy link

Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible.

@Ayushi-Choudhary22
Copy link
Author

hey @nikhil25803 I’m keen to contribute to the scraper project for GSSoC 2024. I’m interested in tackling this issue #1138
Can you assign this to me? Looking forward to getting started!
Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants