Blind video quality assessment via spatiotemporal statistical analysis of adaptive cube size 3D-DCT coefficients


Cemiloglu E., Nur Yılmaz G.

IET IMAGE PROCESSING, vol.14, no.5, pp.845-852, 2020 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 14 Issue: 5
  • Publication Date: 2020
  • Doi Number: 10.1049/iet-ipr.2019.0275
  • Journal Name: IET IMAGE PROCESSING
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Business Source Elite, Business Source Premier, Communication Abstracts, Compendex, Computer & Applied Sciences, INSPEC, Metadex, Civil Engineering Abstracts
  • Page Numbers: pp.845-852
  • Keywords: discrete cosine transforms, distortion, regression analysis, video signal processing, video databases, feature extraction, spatiotemporal phenomena, blind video quality assessment, spatiotemporal statistical analysis, adaptive cube size 3D-DCT coefficients, robust video quality assessment model, video content, reference video, distorted video, spatiotemporal contents, human visual system properties, spatiotemporal frequency bands, EPFL-PoliMi video database, NR-VQA models, no reference model, three-dimensional-discrete cosine transform domain, HVS properties, feature extraction, linear regression analysis, NETWORKED VIDEO
  • TED University Affiliated: No

Abstract

There is an urgent need for a robust video quality assessment (VQA) model that can efficiently evaluate the quality of a video content varying in terms of the distortion and content type in the absence of the reference video. Considering this need, a novel no reference (NR) model relying on the spatiotemporal statistics of the distorted video in a three-dimensional (3D)-discrete cosine transform (DCT) domain is proposed in this study. While developing the model, as the first contribution, the video contents are adaptively segmented into the cubes of different sizes and spatiotemporal contents in line with the human visual system (HVS) properties. Then, the 3D-DCT is applied to these cubes. Following that, as the second contribution, different efficient features (i.e. spectral behaviour, energy variation, distances between spatiotemporal frequency bands, and DC variation) associated with the contents of these cubes are extracted. After that, these features are associated with the subjective experimental results obtained from the EPFL-PoliMi video database using the linear regression analysis for building the model. The evaluation results present that the proposed model, unlike many top-performing NR-VQA models (e.g. V-BLIINDS, VIIDEO, and SSEQ), achieves high and stable performance across the videos with different contents and distortions.