Text Provenance Competition: Code Repository This repository contains the code used for the Text Provenance competition, which can be found here. In this project, we implemented various text similarity algorithms, including:
- LDA (Latent Dirichlet Allocation)
- Doc2Vec
- Word2Vec
- Jaccard Distance
- Edit Distance
- TF-IDF (Term Frequency-Inverse Document Frequency)
These methods were used to calculate text similarity as part of the competition’s task.
More detailed information can be found here:http://www.cips-smp.org/smp_data/4
Feel free to explore the code and adapt it for your own text analysis projects!