Saadi Foundation Corpus Linguistics

visits:1282

Linguistics is the scientific study of language. Knowledge of the linguistics of the language is needed for teaching that language. Many aspects are taken into consideration when teaching a language such as characteristics of the learner (age, nationality, gender), her/his condition (motivation and need) as well as features of the course (focus on grammar or communication). One of the important methods, which has gained prominence in recent years, especially with the technological advance, is the corpora-based approach to linguistics and language education. Corpus is a collection of written or spoken materials showing how the language is used by the learners. They are a collection of structures and correct and incorrect words used by learners in the course of learning a language and are valuable in knowing the process of learning the language. It has been several decades since various corpus linguistics have been collected by researchers to help understand how a student learns a language.
Saadi Foundation has collected database of Persian corpus since a few years ago. It is known as “Saadi Foundation Corpus” containing written and spoken texts by Persian language learners from across the globe. It is constantly increasing in number. The foundation has collected 14,700 written and 600 spoken texts at present of which 5,000 written and 23 spoken texts are added into the database. A small part of the corpuses, which were collected, were annotated. The foundation is planning to expand texts and tagged corpuses. Also, it is set to annotate the corpuses, which are collected, according to foundation’s tagging system of errors. The step will help the foundation to identify errors in the corpus of learners of Persian language, which is targeted by Saadi Foundation’s Corpus Experts Group. Such achievement will help educational planners and authors of textbooks and facilitate recognition of the language level. It will contribute to better teaching and credibility of evaluations and finally promote Persian language in the world.    
Some studies are conducted on corpora of Persian language. Two of them are accessible here as references giving an insight into the collection of corpora and tagging in Persian language.

Safari, Saeed (2012), Design & Creation of Persian Learner Corpus, a Dissertation for master’s degree of Persian Language & Literature for Speakers of Other Languages, Allameh Tabataba’i University  

Safari, Saeed (2018). The Salam Farsi Learner Corpus – Introducing the Error Tagging System. Анали Филолошког факултета, ۳۰ (۲), ۲۴۹–۲۶۳.

Persian learning centers are requested to send photos of written texts or videos of spoken data of their Persian learners to the foundation to help enrich Persian education. It is necessary to study the following links. Please send collections of databases plus metadata via corpus@saadifoundation.ir

Spoken Database Guidelines
Written Database Guidelines
Metadata of Databases

tags: language Persian Language persian corpus foundation saadi foundation written spoken spoken texts persian learning Persian teaching