Ranking Matters: A Comparative Study of BM25, BERT-E5, and Hybrid (BM25 + BERT-E5) Retrival Systems on the SQuAD 2.0 Dataset

Akkuş Ş. G., Emekci H.

2nd International Conference on Artificial Intelligence, Computer, Data Sciences, and Applications, ACDSA 2025, Antalya, Türkiye, 7 - 09 Ağustos 2025, (Tam Metin Bildiri)

Yayın Türü: Bildiri / Tam Metin Bildiri
Doi Numarası: 10.1109/acdsa65407.2025.11166004
Basıldığı Şehir: Antalya
Basıldığı Ülke: Türkiye
Anahtar Kelimeler: BERT, BM25, E5, Hybrid Information Retrieval (IR), Neural Information Retrieval (IR), Ranking Evaluation, SQuAD 2.0, Traditional Information Retrieval (IR)
TED Üniversitesi Adresli: Evet

Özet

In Information Retrieval (IR) systems, retrieving relevant passages or documents for a given query is the major task. Traditional IR models, such as BM25, are based on bag-of-words mechanism which is excel on keyword matching but cannot be capable of the get semantic meanings. On the other hand, neural IR models, such as E5 model, which is created based on the BERT architecture, can capture the meaning between the words with the help of their attention layer mechanism and encoding methods, however employing neural model needs high computational cost. To address these limitations, a hybrid retrieval system is proposed. In this study, a hybrid retrieval system is introduced to combine the strengths of a traditional based (which is BM25) and neural based (which is E5) model. The aim was achieving optimal performance on ranking tasks. To do this, Stanford Question Answering Dataset (SQuAD) version 2.0 is used, and the performances are evaluated by some metrices. Hybrid architecture is created by the linear combination of traditional and neural systems. The wight parameter (α) is tuned to balance the capabilities of BM25 and E5. The hybrid model outperforms BM25 and E5 models individually on the metrices Mean Reciprocal Rank (MRR), Precision@5, Normalized Discounted Cumulative Gain (NDCG@5), and Accuracy with optimal performance achieved at α = 0.1. The result of this study shows that hybrid retrieval systems can effectively combine the strengths of both traditional and neural IR models, providing a new methodology for ranking tasks.