Survey of Afghan (Dari) Language NLP for Building Afghan NLIDB System
View/ Open
Date
2022-09Publisher
Brac UniversityAuthor
Karimi, SadullahMetadata
Show full item recordAbstract
Technology adoption is extremely limited in Afghanistan, especially since people have limited access to the
Internet, smartphone, and computer due to power limitations and the high cost of the Internet. The people
in Afghanistan suffer from high-cost of Internet that is provided by the private sector with very low-speed and
quality. Natural Language Processing (NLP) has various applications and improves access to information and
systems. To advance as a country, Afghanistan needs to be able to utilize existing databases, datasets, and
create new ones and maintain those. Initially, people need a system so they can access the databases providing
various guidance with the limited resource that they have access to. Later, they would benefit from higher
level access for maintenance and crowdsourced contributions. This work first focus on building a system that
Afghanistan people can access database in their native language. Afghan (Dari) language is one of the widely
used languages, with up to 110 million speakers worldwide. It is used in countries like Afghanistan, Azerbaijan,
Iran, Iraq, Russia, Tajikistan, Turkmenistan, Uzbekistan, etc. The Afghan language lacks resources and requires
more qualified lexicon translation. The proposed Afghan Natural Language Interface to Database is based on
a natural language query-response model. Afghan language has been used in the model to extract desired
data from a database. Retrieving data from a database necessitates knowledge of SQL Query Language or a
very well-designed user interface. It is easy for domain experts to retrieve data from databases. However, it
is quite challenging for non-expert users to access the database using SQL queries in absence of a proper and
friendly user interface. This work overcomes the challenge for those who speak the Afghan Language worldwide
to access different databases and datasets. First, we did a survey of current state of Afghan NLP for finding
research gaps for future researchers of the Afghan language. We have identified the research gap of NLIDB
systems. Second, we surveyed non-English NLIDB systems and conducted a systematic review of the current
methods of non-English NLIDB. Then we propose an NLIDB system for Afghan language. Through our system,
users in Afghanistan can access the database through feature phone, land phone calls based on an open-source
Interactive Voice Response (IVR) system in addition to smartphones and computers. The system can be easily
accessed by users without the need for high-speed Internet, sustainable power, computer, and smartphone to
access databases. The system is built according to the limited technology situation in Afghanistan. The Afghan
Spoken NLIDB build through lexical analysis, semantic analysis, and syntax analysis to respond to the Afghan
language natural language query for transforming it into Structured Query Language (SQL).
Keywords
Natural Language Querying; Translating From Afghan to English; Lexical analysis; Syntax analysis; Semantic analysis; Query Generation; Python Library; Data dictionary; Natural language interface to database; NLIDB; Non-English NLIDB; Natural language interface; NLI; Natural language user interface; NLUI; Afghan NLP survey; DariDescription
This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2022.Department
Department of Computer Science and Engineering, Brac UniversityType
ThesisCollections
Related items
Showing items related by title, author, creator and subject.
-
Indigenous languages in Bangladesh & bilingualism: a qualitative study
Sumayra, Nusrat (Brac University, 2023-06)Bangladesh is a country of various cultures. People from different cultures reside in Bangladesh. Language is one of the integral parts of culture. Without language a person would be incomplete. There are around forty-two ... -
Reasons behind using L1 at primary level in English classes of Bangladeshi English medium schools
Turin, Tanjia Afrin (BRAC University, 2014-12)One of the on-going debates among language teachers especially primary level is whether or not to use students‟ First language (L1) in foreign language classrooms. In Bangladesh most of the teachers have been using Bangla ... -
Effectiveness of communicative language teaching at primary level in Bangladesh
Islam, Fariya (BRAC University, 2016-08)Due to the importance and necessity to communicate in English many EFL/ESL countries including Bangladesh adopted Communicative Language Teaching (CLT) approach for teaching English. After many years of its launch learners’ ...