Toward three-dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer

Jun Ren*, Napoleon Reyes, Andre Barczak, Chris Scogings, Mingzhe Liu

*Corresponding author for this work

Research output: Contribution to journalArticleResearchpeer-review

3 Citations (Scopus)

Abstract

Human action recognition is one of the raison d'être for doing human-computer interaction research, as it is highly vital in meeting the demands of modern society, such as automatic video surveillance for security, patient monitoring for recovery, content-based video retrieval, etc. In line with this, deep learning systems are fast becoming the defacto standard for object recognition, video understanding, and pattern recognition due to their inherent powerful feature learning ability from vast amount of data. It makes sense to capitalize on its great success and to further improve it for the complex task of action recognition. One of the contributions in this paper is an effective and yet simple method for encoding the spatiotemporal information from skeleton sequences into what we call temporal kinematic images. In the input encoding scheme, we embed various geometric relational features derived from the skeleton sequence in the form of our proposed skeletal optical flows (SOFs). SOFs collectively represent the variations of kinetic energy, angles between limbs, and pair-wise displacements between joints over consecutive frames of skeleton data, as color variations in the temporal kinematic images. Another contribution is our convolutional neural network with a correctness-vigilant regularizer. It is employed to exploit the discriminative features from the temporal kinematic image for human action recognition. Lastly, we additionally investigated an adaptive label smoothing technique employed toward the end of training iterations. Empirical results show that the efficiency of the proposed method is superior to existing works in terms of the generalizability of the generated model, training convergence speed, and the resulting classification accuracy on nine popular benchmarking datasets, such as MHAD, MSR Activity 3D, HDM05, MSR Daily Activity 3D, and the latest challenging databases, such as UTKinect-Action, NTU RGB+D, Northwestern-UCLA, UWA3DII, and SBU Kinect Interaction datasets.

Original languageEnglish
Article number043040
JournalJournal of Electronic Imaging
Volume27
Issue number4
DOIs
Publication statusPublished - 7 Aug 2018
Externally publishedYes

Fingerprint

Dive into the research topics of 'Toward three-dimensional human action recognition using a convolutional neural network with correctness-vigilant regularizer'. Together they form a unique fingerprint.

Cite this