Near Eastern Studies and Digital Scholarship @IAS Joint Event
Simtho: The Syriac Thesaurus
Simtho: The Syriac Thesaurus. Launch of a Syriac textual corpus portal hosted by Sabine Schmidtke and George A. Kiraz from the Institute for Advanced Study.
Simtho [simtho.bethmardutho.org] is a Syriac corpus search engine with a textual database spanning almost two millennia. A Beta version was revealed at the 2018 AAR/SBL meeting in San Diego with over 6 million words. The upcoming Beta II zlaunch will uncover a textual database of over 13 million words with a new responsive and more attractive user interface.
Team member will discuss various digital humanities and computational linguistics techniques including corpus building, the power or regular expressions, building OCR and HTR models, metadata, and part-of-speech tagging. While these techniques are applied to Syriac, they can be easily transferable to other (especially Semitic) languages.