langID-NLP

Project: Language Identification

This project concerns itself with a Natural Language Processing ("NLP") system which is able to identify the language of a given text. It is part of the examination in the lecture on "Advanced Natural Language Processing" of the M. Sc. Cognitive Systems at the University of Potsdam.

Our team consists of Bhuvanesh Verma, Ian Clotworthy and Arthur Hilbert.

This repository includes the WiLI-2018 dataset: Thoma, Martin. (2018). WiLI-2018 - Wikipedia Language Identification database (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.841984

To use the notebook for language identification:

Navigate to the langID_NLP.ipynb
Click the "Open in Colab"-banner at the top of the file preview
In the Colab menu navigate to Edit->Notebook Settings and choose Hardware accelerator: GPU
Run the notebook

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
WiLI-2018_data		WiLI-2018_data
README.md		README.md
langID_NLP.ipynb		langID_NLP.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

langID-NLP

About

Releases

Packages

Contributors 3

Languages

Dagobert42/langID-NLP

Folders and files

Latest commit

History

Repository files navigation

langID-NLP

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages