Bengali Stop Word and Phrase Detection Mechanism


Haque R. U., Mridha M., Hamid M. A., Abdullah-Al-Wadud M., Islam M. S.

Arabian Journal for Science and Engineering, cilt.45, sa.4, ss.3355-3368, 2020 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 45 Sayı: 4
  • Basım Tarihi: 2020
  • Doi Numarası: 10.1007/s13369-020-04388-8
  • Dergi Adı: Arabian Journal for Science and Engineering
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, Metadex, Pollution Abstracts, zbMATH, Civil Engineering Abstracts
  • Sayfa Sayıları: ss.3355-3368
  • Anahtar Kelimeler: Stop phrase, Stop word, Natural language processing, Finite automaton, Text processing
  • TED Üniversitesi Adresli: Hayır

Özet

Though plenty of research works have been done on stop word/phrase detection, there is no work done on Bengali stop words and stop phrases. This research innovates the definition and classification of Bengali stop words and phrases and implements two approaches to identify them. First one is a corpus-based approach, while the second one is based on the finite-state automaton. Performance of both approaches is measured and compared. Result analysis shows that corpus-based method outperforms the finite-state automaton-based method. The corpus-based and finite-state automaton-based method shows 90% and 80% of accuracy, respectively, for stop word detection and 80% and 70% accuracy, respectively, for stop phrase detection.