-
-
Notifications
You must be signed in to change notification settings - Fork 243
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feat: Implement Multi-Language Support for Scraping #1138
Comments
Hi there! Thanks for opening this issue. We appreciate your contribution to this open-source project. We aim to respond or assign your issue as soon as possible. |
hey @nikhil25803 I’m keen to contribute to the scraper project for GSSoC 2024. I’m interested in tackling this issue #1138 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the feature
Enhance the scraper to support multiple languages, enabling it to scrape content from non-English websites effectively. This will involve:
*Detecting the language of the website.
*Using appropriate libraries and methods to handle different character encodings.
*Adding translations for common scraping elements and error messages.
Add ScreenShots
Web Scraper
Overview
This project is a web scraper designed to extract and process data from websites. It is currently tested on English websites and is being enhanced to handle multi-language content seamlessly.
Current Scraper Output
English Site
Description: Screenshot showing the scraper working perfectly with an English website.
Potential Issues with Non-English Sites
Text Encoding or Parsing Issues
Description: Screenshot displaying the scraper encountering issues with text encoding or parsing on a non-English website.
Expected Output with Multi-Language Support
Mock-Up of Expected Results
Description: Mock-up of the expected output where the scraper handles multiple languages seamlessly.
Features
Record
The text was updated successfully, but these errors were encountered: