Project: Language Identification
This project concerns itself with a Natural Language Processing ("NLP") system which is able to identify the language of a given text. It is part of the examination in the lecture on "Advanced Natural Language Processing" of the M. Sc. Cognitive Systems at the University of Potsdam.
Our team consists of Bhuvanesh Verma, Ian Clotworthy and Arthur Hilbert.
This repository includes the WiLI-2018 dataset: Thoma, Martin. (2018). WiLI-2018 - Wikipedia Language Identification database (Version 1.0.0) [Data set]. Zenodo. http://doi.org/10.5281/zenodo.841984
To use the notebook for language identification:
- Navigate to the langID_NLP.ipynb
- Click the "Open in Colab"-banner at the top of the file preview
- In the Colab menu navigate to Edit->Notebook Settings and choose Hardware accelerator: GPU
- Run the notebook