Recognizing activities in multiple views with fusion of frame judgments

Pehlivan Tort S., Forsyth D. A.

Image and Vision Computing, vol.32, no.4, pp.237-249, 2014 (SCI-Expanded) identifier identifier

  • Publication Type: Article / Article
  • Volume: 32 Issue: 4
  • Publication Date: 2014
  • Doi Number: 10.1016/j.imavis.2014.01.006
  • Journal Name: Image and Vision Computing
  • Journal Indexes: Science Citation Index Expanded (SCI-EXPANDED), Scopus
  • Page Numbers: pp.237-249
  • Keywords: Video analysis, Human activity recognition, Multiple views, Multiple camera, RECOGNITION, REPRESENTATION, MOTION
  • TED University Affiliated: Yes


This paper focuses on activity recognition when multiple views are available. In the literature, this is often performed using two different approaches. In the first one, the systems build a 3D reconstruction and match that. However, there are practical disadvantages to this methodology since a sufficient number of overlapping views is needed to reconstruct, and one must calibrate the cameras. A simpler alternative is to match the frames individually. This offers significant advantages in the system architecture (e.g., it is easy to incorporate new features and camera dropouts can be tolerated). In this paper, the second approach is employed and a novel fusion method is proposed. Our fusion method collects the activity labels over frames and cameras, and then fuses activity judgments as the sequence label. It is shown that there is no performance penalty when a straightforward weighted voting scheme is used. In particular, when there are enough overlapping views to generate a volumetric reconstruction, our recognition performance is comparable with that produced by volumetric reconstructions. However, if the overlapping views are not adequate, the performance degrades fairly gracefully, even in cases where test and training views do not overlap. © 2014 Elsevier B.V.