Hypothetica: A Multi-Agent System for AI-Powered Originality Assessment


Tan H. H., Alp Malkoc A., Becerir K., Erol B., Karakaya K. M.

5th International Conference on Informatics and Software Engineering, IISEC 2026, Ankara, Türkiye, 5 - 06 Şubat 2026, ss.525-530, (Tam Metin Bildiri) identifier

  • Yayın Türü: Bildiri / Tam Metin Bildiri
  • Doi Numarası: 10.1109/iisec69317.2026.11418474
  • Basıldığı Şehir: Ankara
  • Basıldığı Ülke: Türkiye
  • Sayfa Sayıları: ss.525-530
  • Anahtar Kelimeler: large language models, multi-agent systems, research originality, retrieval-augmented generation
  • TED Üniversitesi Adresli: Evet

Özet

Researchers often need to spend substantial time manually reviewing prior studies to confirm that a proposed project does not repeat existing work, and this process can take a significant amount of time. This paper introduces Hypothetica, an AI-powered system that evaluates the originality of proposed research ideas by comparing them with existing work in the arXiv literature. The system uses a multi-agent architecture based on large language models, combined with retrieval-augmented generation (RAG), to support explainable assessments of originality. The system runs a semantic similarity search in ChromaDB using E5 embeddings, extracts text from PDFs, and then reviews the material across four dimensions: novelty of the technical problem, methodological innovation, overlap in the application area, and stated innovation claims. Based on this process, it points out where a proposed research idea matches or overlaps with earlier studies. Hypothetica produces a numerical originality score and detailed reports with its respective evidence-based feedback, mapping each sentence in the user's research idea that overlaps with prior work to the related passages in existing papers. The system's performance was assessed using 10 varied test cases prepared from several domains. In the system evaluation, 40% of the test cases scored above 60/100, and the highest score of originality observed was 91/100. These results indicated that the approach performs well on tasks related to structured data and on tasks that involve generating architectures. This approach allows researchers to improve their ideas by using clear, practical feedback before they begin their research.