OneDeadKey · fabi1cazenave · Dec 9, 2024 · Dec 9, 2024
diff --git a/kalamine/www/corpus/LICENSE b/kalamine/www/corpus/LICENSE
diff --git a/kalamine/www/corpus/README.md b/kalamine/www/corpus/README.md
@@ -1,14 +1,17 @@
-# Corpus for layout analysis
+# Corpus for Layout Analysis
+
+All JSON files have been generated with [kalamine-corpus](https://github.com/OneDeadKey/kalamine-corpus?tab=readme-ov-file).
 
 ## `fr` / `en`
 
-Those corpora and stats come from Don Quixote
+Those corpora and stats come from Don Quixote (Cervantes), Gutenberg Project.
 
 ## `fra_mixed-typical_2012_1M-sentences`
 
 These stats come from [University of Leipzig](https://wortschatz.uni-leipzig.de/en/download/French#fra_mixed_2012)
 
 ### Sources
+
 French Mixed-Typical 2012, 1M sentences file has been extracted, and the
 sentence indices have been stripped with `awk '!($1="")' 
 fra_mixed-typical_2012_1M/fra_mixed-typical_2012_1M-sentences.txt > 

diff --git a/kalamine/www/corpus/chardict.py b/kalamine/www/corpus/chardict.py