Skip to content

Commit

Permalink
Update projects.md
Browse files Browse the repository at this point in the history
  • Loading branch information
bplank authored Sep 1, 2024
1 parent 3118044 commit ae435f0
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion _pages/projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,7 @@ MSc/BSc thesis research vectors:

* *Transfer or translate: how to better work with dialectal data.* Demands for generalizing NLP pipelines to dialectal data are on the rise. Given current LLMs trained in hundreds of languages, there are two common approaches. The first approach is to translate (or normalize) dialectal data to its mainstream counterpart and apply pipelines to the translated mainstream counterpart. Such an approach can benefit from the bigger amount of unannotated and annotated data in the mainstream variant but suffers from error propagation in the pipeline. The second transfer approach is to annotate a small amount of dialectal data and few-shot transfer (finetune) models on the dialect. This involves more dialectal annotation as well as collected unannotated dialectal data. Reference: [Zampieri et al. 2020](https://helda.helsinki.fi/server/api/core/bitstreams/dd1636da-66ef-4e2d-bdb7-19c0b27080f3/content). For a BSc thesis, you would choose an NLP task (e.g., syntactic or semantic parsing, sentiment or stance detection, QA or summarization, etc.) and a specific dialect, compare performances of fewshot versus translation approaches quantitatively, and conduct a qualitative error analysis on the difficult cases. For MSc, the research needs to scale up either to multiple dialects (in the same or across different language families) or to multiple NLP tasks. Level: BSc or MSc.

* *Language Modeling of Historical Non-Standard Language Documents.* Digitalisation can provide access to valuable historical information, especially for non-standard languages and dialects. In this project, you test and build a prototype for digitalisation of historical documents using recent visual representation-based methods. The project include: data gathering and annotation, model evaluation and improvement (e.g. by augmentation methods). References: [Salesky et al., 2021](https://aclanthology.org/2021.emnlp-main.576/), [Borenstein et al., 2023](https://aclanthology.org/2023.emnlp-main.7.pdf). Level: MSc.
* *Language Modeling of Historical Non-Standard Language Documents.* Digitalisation can provide access to valuable historical information, especially for non-standard languages and dialects. In this project, you test and build a prototype for digitalisation of historical documents using recent visual representation-based methods. The project includes: data gathering and annotation, model evaluation and improvement (e.g. by augmentation methods). References: [Salesky et al., 2021](https://aclanthology.org/2021.emnlp-main.576/), [Borenstein et al., 2023](https://aclanthology.org/2023.emnlp-main.7.pdf). Level: MSc.


<a name="v2"/>
Expand Down

0 comments on commit ae435f0

Please sign in to comment.