Recognizing activities in multiple views with fusion of frame judgments


Pehlivan Tort S., Forsyth D. A.

Image and Vision Computing, cilt.32, sa.4, ss.237-249, 2014 (SCI-Expanded) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 32 Sayı: 4
  • Basım Tarihi: 2014
  • Doi Numarası: 10.1016/j.imavis.2014.01.006
  • Dergi Adı: Image and Vision Computing
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Sayfa Sayıları: ss.237-249
  • Anahtar Kelimeler: Video analysis, Human activity recognition, Multiple views, Multiple camera, RECOGNITION, REPRESENTATION, MOTION
  • TED Üniversitesi Adresli: Evet

Özet

This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap. © 2014 Elsevier B.V.