Optimal number of clusters in explainable data analysis of agent-based simulation experiments

Xie, Shengkun; Lawniczak, Anna; Gan, Chong

doi:10.1016/j.jocs.2022.101685

Optimal number of clusters in explainable data analysis of agent-based simulation experiments

Atıf İçin Kopyala

Xie S., Lawniczak A. T., Gan C.

Journal of Computational Science, cilt.62, 2022 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 62
Basım Tarihi: 2022
Doi Numarası: 10.1016/j.jocs.2022.101685
Dergi Adı: Journal of Computational Science
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
Anahtar Kelimeler: Principal component analysis, K-means clustering, Explainable data analysis, Analysis of designed experiments, Agent-based simulations, Complex systems
TED Üniversitesi Adresli: Hayır

Özet

© 2022 Elsevier B.V.Dimension reduction and visualization of data generated from a complex simulation model are essential aspects for a better understanding of the behaviours of complex systems, and they are often required in many investigations besides computer simulation and modelling. Explainable data analysis or explainable data analytics is a new concept recently appearing in the literature within the machine learning community. The research on data explainability has become important in various fields where machine learning techniques are used. In this paper, we have extended this concept to the analysis of agent-based simulations. We refer to the explainability in the context of improving visualization and analysis of such type of simulation data, so that, a better understanding of nature of a complex system can be achieved. In this work, principal component analysis was first used as a feature extractor to reduce the dimension of the input data matrix to extract the feature vector, which consists of a set of low dimensional principal component scores. The K-means clustering method is then applied to cluster the principal component scores, so that, different treatments can be grouped to identify their similarity. Since it is crucial to determine the number of clusters in a clustering analysis, in order to improve the clustering results and meet our objective of reducing the number of treatments, two new algorithms are proposed to identify the optimal K value. The proposed methods are illustrated by applying them to data from an agent-based simulation model, and the obtained results demonstrate that the proposed methods are promising in determining the optimal number of clusters in K-means clustering and that they outperform the traditional methods.