From ae435f0659dd3055224e9b2c2ed3d9918067939c Mon Sep 17 00:00:00 2001 From: Barbara Plank Date: Sun, 1 Sep 2024 15:31:30 +0200 Subject: [PATCH] Update projects.md --- _pages/projects.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/_pages/projects.md b/_pages/projects.md index 33bb924..3262597 100644 --- a/_pages/projects.md +++ b/_pages/projects.md @@ -150,7 +150,7 @@ MSc/BSc thesis research vectors: * *Transfer or translate: how to better work with dialectal data.* Demands for generalizing NLP pipelines to dialectal data are on the rise. Given current LLMs trained in hundreds of languages, there are two common approaches. The first approach is to translate (or normalize) dialectal data to its mainstream counterpart and apply pipelines to the translated mainstream counterpart. Such an approach can benefit from the bigger amount of unannotated and annotated data in the mainstream variant but suffers from error propagation in the pipeline. The second transfer approach is to annotate a small amount of dialectal data and few-shot transfer (finetune) models on the dialect. This involves more dialectal annotation as well as collected unannotated dialectal data. Reference: [Zampieri et al. 2020](https://helda.helsinki.fi/server/api/core/bitstreams/dd1636da-66ef-4e2d-bdb7-19c0b27080f3/content). For a BSc thesis, you would choose an NLP task (e.g., syntactic or semantic parsing, sentiment or stance detection, QA or summarization, etc.) and a specific dialect, compare performances of fewshot versus translation approaches quantitatively, and conduct a qualitative error analysis on the difficult cases. For MSc, the research needs to scale up either to multiple dialects (in the same or across different language families) or to multiple NLP tasks. Level: BSc or MSc. - * *Language Modeling of Historical Non-Standard Language Documents.* Digitalisation can provide access to valuable historical information, especially for non-standard languages and dialects. In this project, you test and build a prototype for digitalisation of historical documents using recent visual representation-based methods. The project include: data gathering and annotation, model evaluation and improvement (e.g. by augmentation methods). References: [Salesky et al., 2021](https://aclanthology.org/2021.emnlp-main.576/), [Borenstein et al., 2023](https://aclanthology.org/2023.emnlp-main.7.pdf). Level: MSc. + * *Language Modeling of Historical Non-Standard Language Documents.* Digitalisation can provide access to valuable historical information, especially for non-standard languages and dialects. In this project, you test and build a prototype for digitalisation of historical documents using recent visual representation-based methods. The project includes: data gathering and annotation, model evaluation and improvement (e.g. by augmentation methods). References: [Salesky et al., 2021](https://aclanthology.org/2021.emnlp-main.576/), [Borenstein et al., 2023](https://aclanthology.org/2023.emnlp-main.7.pdf). Level: MSc.