Bengali Stop Word and Phrase Detection Mechanism

Haque R. U., Mridha M., Hamid M. A., Abdullah-Al-Wadud M., Islam M. S.

Arabian Journal for Science and Engineering, vol.45, no.4, pp.3355-3368, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 45 Issue: 4
  • Publication Date: 2020
  • Doi Number: 10.1007/s13369-020-04388-8
  • Journal Name: Arabian Journal for Science and Engineering
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Aerospace Database, Communication Abstracts, Metadex, Pollution Abstracts, zbMATH, Civil Engineering Abstracts
  • Page Numbers: pp.3355-3368
  • Keywords: Stop phrase, Stop word, Natural language processing, Finite automaton, Text processing
  • TED University Affiliated: No


Though plenty of research works have been done on stop word/phrase detection, there is no work done on Bengali stop words and stop phrases. This research innovates the definition and classification of Bengali stop words and phrases and implements two approaches to identify them. First one is a corpus-based approach, while the second one is based on the finite-state automaton. Performance of both approaches is measured and compared. Result analysis shows that corpus-based method outperforms the finite-state automaton-based method. The corpus-based and finite-state automaton-based method shows 90% and 80% of accuracy, respectively, for stop word detection and 80% and 70% accuracy, respectively, for stop phrase detection.