Dear contributors, please be aware that cuneiform languages are different. For instance, the most popular are Elamite, Babylonian and Old Persian; we are working on Old Persian. Below you can see the differences:
(Photo is taken from national museum of Iran, the gold plate of king Darius)
/imagedata/
/source/
/king/
source_king_001.jpg
#example:
/behistun/
/darius_1/
behistun_darius_1_001.jpg
/textdata/
/eng_transcription_to_english/
/metadata/
eng_transcription_to_english_001.json
/eng_transliteration_to_english/
/metadata/
eng_transliteration_to_english_001.json
/single/
/metadata/
/eng_transliteration/
eng_transliteration_001.json
# "single" refers to text data that are just a text without translation
Translating Old Persian language has some methods, for example, transliteration and transcription. Below you can see an example to know the difference between them:
For each directory a "source.metadata.csv" file is provided to see the information of data.
Explanation about metadata columns:
imagedata:
source: The source that I have taken data from.
abbreviation: The name of inscription
location: The main discovered location of that inscription.
translation: 1: if I have the translation of that inscription, 0: if I have not.
collection: The palace of storing that inscription at this current time.
artifact_id : artifact_id from CDLI reference
asset_number: asset_number from british museum collection
museum_number: museum_number from british museum collection
textdata:
abbreviation: The name of inscription
reference: The reference that I have taken data from.
location: The main discovered location of that inscription
image: 1: if I have the image of that inscription, 0: if I have not.
artifact_id : artifact_id from CDLI reference
-
Book: The Inscriptions in Old Persian Cuneiform of the Achaemenian Emperors by Ralph Norman Sharp
-
Personal photography from national museum of Iran and Takht-e-Jamshid (Persepolis)
In the first stage, Old Persian cuneiform will be converted to English transcription text as an output using an OCR model. In the second stage, that English transcription text will be the input for an NLP or Large language model (LLM) model to be converted to modern languages. The NLP model performs as a machine translation model
Behistun:بیستون
Susa:شوش
Persepolis:پرسپولیس(تخت جمشید)
Elamite:ایلامی
Babylonian:بابِلی
Cyrus:کوروش
Xerxes:خشایار
Artaxerxes:اردشیر
𐎠𐎢𐎼𐎶𐏀𐎡𐎠:اهورامزدا
This repository is under CC-BY-NC license and any commercial use is prohibited.