-
Notifications
You must be signed in to change notification settings - Fork 4
ICS02: 10. Text analysis with R
Gabriel Bodard edited this page Mar 14, 2019
·
17 revisions
Thursday Mar 14, 16:00 UK = 18:00 EET
Convenors: Maciej Eder (Kraków), Robert Gorman (University of Nebraska–Lincoln) & Christopher Ohge (University of London)
YouTube link: https://youtu.be/2Fo4HxGZ5o4
Notebooks:
- RGorman's notebook (HTML) and XML files (5 files).
- M Eder's notebook (HTML) and small corpus (22 files)
Slides: tba
This session will examine some specialist libraries in R for text analysis. We will review the tidytext package from the previous session, then examine in depth two crucial (and complementary) forms of text analysis. The first will work with larger datasets with Stylo, and the second will show how to analyse and visualise encoded texts in XML.
- Review tidytext from previous session (Ohge).
- Stylometry: intro to the Stylo package (Eder). Stylometry, or applying statistical methods to trace stylistic differences between (literary) texts, is usually associated with the question of authorship attribution. It relies on the assumption that each author has his/her own distinct lexical profile, e.g. reflected in idiosyncrasies of word frequencies. The R package ‘stylo’ provides a set of functions, convenient supplemented by a graphical user interface for high-level exploratory analyses, which makes it especially suited for novice users, without programming skills.
- XML library: treebanking and linguistic analyses of encoded texts (Gorman).
- Kestermont, Mike & Justin A. Stover (2016), "The Authorship of the Historia Augusta: Two new computational studies." Bulletin of the Institute of Classical Studies 59.2. Pp. 140–157. Available: https://onlinelibrary.wiley.com/doi/epdf/10.1111/j.2041-5370.2016.12043.x
- (tba)
- Büchler, Marco, et al. (2013), "Measuring the Influence of a Work by Text-Reuse." In ed. Dunn/Mahony, The Digital Classicist 2013. Bulletin of the Institute of Classical Studies, Supplement 122. Pp. 63–79.
- Dye, Melody, Petar Milin, Richard Futrell and Michael Ramscar, “A Functional Theory of Gender Paradigms.” Available: http://web.mit.edu/futrell/www/papers/dye2017functional.pdf
- Eder, M. (2012). Computational stylistics and Biblical translation: how reliable can a dendrogram be? In T. Piotrowski & Ł. Grabowski (Eds.), The translator and the computer, pp. 155–170. Wrocław: WSF Press. Available: https://www.wsf.edu.pl/upload_module/wysiwyg/Wydawnictwo%20WSF/The%20Translator%20and%20the%20Computer_Piotrowski_Grabowski.pdf
- Eder, M. (2013). Mind your corpus: systematic errors in authorship attribution. Literary and Linguistic Computing, 28(4), 603–614. Pre-print available: https://github.com/computationalstylistics/preprints/blob/master/m-eder_mind_your_corpus.pdf
- Eder, M., Rybicki, J., Kestemont, M. 'Stylometry with R.' The R Journal 8/1 (Aug. 2016). Available: https://journal.r-project.org/archive/2016-1/eder-rybicki-kestemont.pdf
- Gulordava, Kristina (2018). Word order variation and dependency length minimisation : a cross-linguistic computational approach. Thèse de doctorat : Univ. Genève. Available: https://doi.org/10.13097/archive-ouverte/unige:106855. (Esp. chapter 3, “The DLM principle and word order variability at the language level,” pp. 64-106).
- Stover, J., & Kestemont, M. (2016). Reassessing the Apuleian corpus: A computational approach to authenticity. The Classical Quarterly, 66(2), 645–672.
- tba
- tba