Skip to content

Commit

Permalink
Update projects.md remove some old projects
Browse files Browse the repository at this point in the history
  • Loading branch information
bplank authored Sep 9, 2023
1 parent c31391b commit 5bcff8c
Showing 1 changed file with 0 additions and 5 deletions.
5 changes: 0 additions & 5 deletions _pages/projects.md
Original file line number Diff line number Diff line change
Expand Up @@ -112,15 +112,10 @@ Legend:

- *NLP methods for Folk Songs Lyrics.* Folk music is an essential element of any culture. This project seeks to apply NLP techniques to study folk music of the German-speaking countries with a special focus on song lyrics written in dialect. You will start with building a pipeline for large-scale lyrics collection. Next, you will conduct a comprehensive analysis of song lyrics including (but not limited to): discovering most popular lyrical themes, studying rhymes and the figurative speech used in lyrics. Level: BSc or MSc.

- *Code-switching usage in German social media.* Code-switching is a language phenomenon that occurs when a multilingual speaker alternates multiple languages in a single utterance. This project studies when and why people mix high-standard German with its dialects when writing on social media. The ultimate goal is to explore contexts and grammar structures in which code-switching is mostly used. To this end, a pipeline for user-generated data collection and labeling should be developed along with learnable approaches to code-switching detection. Level: MSc.

- *Large Language Models for low-resource NLP revisited.* At the moment, large language models (LLMs) are all the hype, but do we actually need them for low-resource tasks? In this project, the student compares LLM fine-tuning with computationally cheaper ways of training a model for a low-resource language variety and a NLP task (any classification task or a sequence labeling task). Level: BSc or MSc.

- *Learning Task Representations.* We are often interested in transferring NLP/IR models to datasets for which we have little or no label annotations available. In such a zero-shot setting its possible to transfer a model from a single related task or a set of related tasks. Representing tasks and measuring task similarity is an open challenge and active research field, the goal of this thesis is to explore approaches for deriving task representations and evaluating their effectiveness in a multi-task setting. Level: MSc.

- *Machine Translation Error Propagation on Cross-lingual Retrieval.*
Machine Translation (MT) is frequently used to bridge the gap in cross-lingual information retrieval (CLIR). MT models typically used to translate training data or to translate queries at test time. Recent work has focused on hallucinations of MT models, referring to the phenomenon where translations are, e.g., completely unrelated to the input text. The goal of this thesis is to systematically and broadly analyze different types of MT error propagation with respect to different languages and their impact on CLIR. (proficiency in multiple languages is desired for students who would like to do this project). Level: BSc or MSc.

- *Code-Switching in Cross-Lingual Information Retrieval.* When we train retrieval models on monolingual data the model can learn to predict the document relevance from keyword overlaps with the query or from semantic context. Arguably, keyword matching is an easier task than learning semantic concepts. In our [recent work](https://aclanthology.org/2023.findings-acl.193/) we show that retrieval models trained on English data are biased towards keyword matching, which is less problematic if we transfer the model to other monolingual setups. However, in cross-lingual information retrieval (CLIR) the query language vocabulary is different from the document language vocabulary and relying on keyword overlap is suboptimal. To mitigate this bias and improve retrieval results, we propose code-switching the training data as a way to mitigate this bias. The goal of this thesis is to experiment with more sophisticated code-switching approaches, additional languages and different domains. Level: BSc or MSc.


Expand Down

0 comments on commit 5bcff8c

Please sign in to comment.