Feature extraction of auto insurance size of loss data using functional principal component analysis

Xie S.

Expert Systems with Applications, vol.198, 2022 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 198
  • Publication Date: 2022
  • Doi Number: 10.1016/j.eswa.2022.116780
  • Journal Name: Expert Systems with Applications
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
  • Keywords: Explainable data analysis, Data visualization, Functional principal component analysis, Loss distribution, Business analytics, Spline methods
  • TED University Affiliated: No


© 2022To better understand the major pattern and natural variation of statistical reporting data, visualization and interpretability have been an ongoing challenging issue, particularly in complex statistical data analysis. In auto insurance rate regulation, the study of the size-of-loss distribution is critical because it serves as a benchmark for rate filings conducted by insurance companies. In this work, we propose dimension reduction and feature exaction approaches via functional principal component analysis to analyse auto insurance statistical reporting data. Dimension reduction aims to estimate the overall functional pattern of the size-of-loss, while feature extraction is to study the similarity and variability of extracted features in the feature subspace. We also investigate the functionality of relative claim frequency, size-of-loss, and the pattern and variability of extracted features, using loss data at the industry level. The proposed method helps improve data explainability and understanding of the overall auto insurance size-of-loss distributions pattern. Also, we found that the proposed method can capture the tail behaviour of the size-of-loss distribution. It is also applicable to similar data analysis in many applications, including economics and finance. The impact of this research is to help develop strategies to control and manage the proposed rate changes by insurance companies for rate regulation purposes in the aspect related to large loss loading for reinsurance pricing.