Intelligent Systems with Applications, vol.30, 2026 (ESCI, Scopus)
Multimodal Emotion Recognition (MER) is an advanced field of AI and affective computing that empowers machines to interpret human emotions by analyzing and integrating multiple behavioral and physiological signals. Unlike unimodal systems, MER captures emotional complexity through cross-modal synergy, significantly improving recognition in real-world settings. Advances in deep learning, particularly deep learning-based architectures, have propelled MER research, excelling at modeling long-range dependencies and aligning diverse modalities. However, challenges remain, including effective fusion design, handling missing modalities, ensuring generalization across users, and addressing the scarcity of balanced multimodal datasets. This study reviews MER research from 2020 to 2025, highlighting innovations in the methods of MER, emotion theories, taxonomies, preprocessing, feature extraction and selection, recent state-of-the-art learning methods, MER datasets of the last decade, fusion and representation techniques, recent state-of-the-art MER methods, application areas, and discussions on challenges and open research issues. Recent multimodal datasets, including newly released EEG–audio–video resources from 2024, are also discussed in terms of their modalities, annotation schemes, and relevance to MER benchmarking. By consolidating insights across methodological, architectural, and practical dimensions, this review provides a unified reference for understanding the state of the art and highlights key challenges and research directions needed to achieve robust, scalable, and real-world MER systems.