Bengali Stop Word and Phrase Detection Mechanism

Haque, Rakib; Mridha, M.F.; Hamid, Md.; Abdullah-Al-Wadud, M.; Islam, Saiful

doi:10.1007/s13369-020-04388-8

Bengali Stop Word and Phrase Detection Mechanism

Haque R. U., Mridha M., Hamid M. A., Abdullah-Al-Wadud M., Islam M. S.

Arabian Journal for Science and Engineering, cilt.45, sa.4, ss.3355-3368, 2020 (SCI-Expanded, Scopus)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 45 Sayı: 4
Basım Tarihi: 2020
Doi Numarası: 10.1007/s13369-020-04388-8
Dergi Adı: Arabian Journal for Science and Engineering
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, Metadex, Pollution Abstracts, zbMATH, Civil Engineering Abstracts
Sayfa Sayıları: ss.3355-3368
Anahtar Kelimeler: Stop phrase, Stop word, Natural language processing, Finite automaton, Text processing
TED Üniversitesi Adresli: Hayır

Özet

Though plenty of research works have been done on stop word/phrase detection, there is no work done on Bengali stop words and stop phrases. This research innovates the definition and classification of Bengali stop words and phrases and implements two approaches to identify them. First one is a corpus-based approach, while the second one is based on the finite-state automaton. Performance of both approaches is measured and compared. Result analysis shows that corpus-based method outperforms the finite-state automaton-based method. The corpus-based and finite-state automaton-based method shows 90% and 80% of accuracy, respectively, for stop word detection and 80% and 70% accuracy, respectively, for stop phrase detection.