Optimal number of clusters in explainable data analysis of agent-based simulation experiments


Xie S., Lawniczak A. T., Gan C.

Journal of Computational Science, vol.62, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 62
  • Publication Date: 2022
  • Doi Number: 10.1016/j.jocs.2022.101685
  • Journal Name: Journal of Computational Science
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Compendex, INSPEC
  • Keywords: Principal component analysis, K-means clustering, Explainable data analysis, Analysis of designed experiments, Agent-based simulations, Complex systems
  • TED University Affiliated: No

Abstract

© 2022 Elsevier B.V.Dimension reduction and visualization of data generated from a complex simulation model are essential aspects for a better understanding of the behaviours of complex systems, and they are often required in many investigations besides computer simulation and modelling. Explainable data analysis or explainable data analytics is a new concept recently appearing in the literature within the machine learning community. The research on data explainability has become important in various fields where machine learning techniques are used. In this paper, we have extended this concept to the analysis of agent-based simulations. We refer to the explainability in the context of improving visualization and analysis of such type of simulation data, so that, a better understanding of nature of a complex system can be achieved. In this work, principal component analysis was first used as a feature extractor to reduce the dimension of the input data matrix to extract the feature vector, which consists of a set of low dimensional principal component scores. The K-means clustering method is then applied to cluster the principal component scores, so that, different treatments can be grouped to identify their similarity. Since it is crucial to determine the number of clusters in a clustering analysis, in order to improve the clustering results and meet our objective of reducing the number of treatments, two new algorithms are proposed to identify the optimal K value. The proposed methods are illustrated by applying them to data from an agent-based simulation model, and the obtained results demonstrate that the proposed methods are promising in determining the optimal number of clusters in K-means clustering and that they outperform the traditional methods.