Recognizing involuntary actions from 3D skeleton data using body states

Document Type : Article


Department of Electrical Engineering, Sharif University of Technology, Tehran, Iran


Human action recognition has been one of the most active fields of research in computer vision over the last years. Two dimensional action recognition methods are facing serious challenges such as occlusion and missing the third dimension of data. Development of depth sensors has made it feasible to track positions of human body joints over time. This paper proposes a novel method for action recognition which uses
temporal 3D skeletal Kinect data. This method introduces the definition of body states and then every action is modeled as a sequence of these states. The learning stage uses Fisher Linear Discriminant Analysis (LDA) to construct discriminant feature space for discriminating the body states. Moreover, this paper suggests the use of the Mahalonobis distance as an appropriate distance metric for the classification of the states of involuntary actions. Hidden Markov Model (HMM) is then used to model the temporal transition between the body states in each action. According to the results, this method significantly outperforms other popular methods, with recognition (recall) rate of 88.64% for eight different actions and up to 96.18% for classifying the class of all fall actions versus normal actions.


Main Subjects

1. Yao, A., Gall, J., and Gool, L.V. "Coupled action recognition and pose estimation from multiple views", International Journal of Computer Vision, 100(1), pp. 16-37 (2012).
2. Guo, K., Ishwar, P., and Konrad, J. "Action recognition from video using feature covariance matrices", IEEE Transactions on Image Processing, 22(6), pp. 2479-2494 (2013).
3. Wang, H., Klaser, A., Schmid, C., and Liu, C.L. "Dense trajectories and motion boundary descriptors for action recognition", International Journal of Computer Vision, 103(1), pp. 60-79 (2013).
4. Liu, J., Luo, J., and Shah, M. "Recognizing realistic actions from videos in the wild", In Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on, pp. 1996-2003 (2009).
5. Wang, H., Klaser, A., Schmid, C., and Liu, C.L. "Action recognition by dense trajectories", In Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 3169-3176 (2011).
6. Niebles, J.C., Wang, H., and Fei-Fei, L. "Unsupervised learning of human action categories using spatialtemporal words", International Journal of Computer Vision, 79(3), pp. 299-318 (2008).
7. Holte, M.B., Tran, C., Trivedi, M.M., and Moeslund, T.B. "Human action recognition using multiple views: a comparative perspective on recent developments", In Proceedings of the 2011 Joint ACM Workshop on Human Gesture and Behavior Understanding, ACM, pp. 47-52 (2011).
8. Xia, L., Chen, C.C., and Aggarwal, J.K. "View invariant human action recognition using histograms of 3D joints", In Computer Vision and Pattern Recognition Workshops (CVPRW), 2012 IEEE Computer Society Conference on, pp. 20-27 (2012).
9. Zhao, W., Chellappa, R., Phillips, P.J., and Rosenfeld, A. "Face recognition: A literature survey", ACM Computing Surveys (CSUR), 35(4), pp. 399-458 (2003).
10. Peng, X., Wang, L., Wang, X., and Qiao, Y. "Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice", Computer Vision and Image Understanding, 150, pp. 109-125 (2016).
11. Liu, L, Shao, L., Li, X., and Lu, K. "Learning  patiotemporal representations for action recognition: A genetic programming approach", IEEE Transactions on Cybernetics, 46(1), pp. 158-170 (2016).
12. Wang, H., Oneata, D., Verbeek, J., and Schmid, C. "A robust and efficient video representation for action recognition", International Journal of Computer Vision, 119(3), pp. 219-238 (2016).
13. Li, W., Zhang, Z., and Liu, Z. "Action recognition based on a bag of 3d points", In Computer Vision and Pattern Recognition Workshops (CVPRW), 2010 IEEE Computer Society Conference on, pp. 9-14 (2010).
14. Rahmani, H., Mahmood, A., Huynh, D., and Mian, A. "Histogram of oriented principal components for cross-view action recognition", IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(12), pp. 2430-2443 (2016).
15. Zhao, Y., Liu, Z., Yang, L., and Cheng, H. "Combing rgb and depth map features for human activity recognition", In Signal & Information Processing Association Annual Summit and Conference (APSIPA ASC), 2012 Asia-Pacific, IEEE, pp. 1-4 (2012).
16. Liu, M., Chen, C., Meng, F., and Liu, H. "3d action recognition using multi-temporal skeleton visualization", In Multimedia & Expo Workshops (ICMEW), 2017 IEEE International Conference on, pp. 623-626 (2017).
17. Rahmani, H. and Mian, A. "3d action recognition from novel viewpoints", In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1506-1515 (2016).
18. Zhang, B., Yang, Y., Chen, C., Yang, L., Han, J., and Shao, L. "Action recognition using 3d histograms of texture and a multi-class boosting classifier", IEEE Transactions on Image Processing, 26(10), pp. 4648- 4660 (2017).
19. Chen, C., Liu, K., and Kehtarnavaz, N. "Real-time human action recognition based on depth motion maps", Journal of Real-Time Image Processing, 12(1), pp. 155-163 (2016).
20. Chen, C., Liu, M., Zhang, B., Han, J., Jiang, J., and Liu, H. "3D action recognition using multi-temporal depth motion maps and fisher vector", In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16), pp. 3331-3337 (2016).
21. Liang, C., Chen, E., Qi, L., and Guan, L. "3d action recognition using depth-based feature and localityconstrained affine subspace coding", In Multimedia (ISM), 2016 IEEE International Symposium on, pp. 261-266 (2016).
22. Hussein, M.E., Torki, M., Gowayyed, M.A., and El- Saban, M. "Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations", In IJCAI, 13, pp. 2466-2472 (2013).
23. Reddy, V.R. and Chattopadhyay, T. "Human activity recognition from kinect captured data using stick model", In International Conference on Human- Computer Interaction, Springer, pp. 305-315 (2014).
24. Martinez-Zarzuela, M., Diaz-Pernas, F.J., Tejerosde- Pablos, A., Gonzalez-Ortega, D., and Anton- Rodriguez, M. "Action recognition system based on human body tracking with depth images", Advances in Computer Science: An International Journal, 3(1), pp. 115-123 (2014).
25. Anjum, M.L., Ahmad, O., Rosa, S., Yin, J., and Bona, B. "Skeleton tracking based complex human activity recognition using kinect camera", In International Conference on Social Robotics, Springer, pp. 23-33 (2014).
26. Liu, J., Shahroudy, A., Xu, D., and Wang, G. "Spatiotemporal lstm with trust gates for 3d human action recognition", In European Conference on Computer Vision, Springer, pp. 816-833 (2016).
27. Ke, Q., Bennamoun, M., An, S., Sohel, F., and Boussaid, F. "A new representation of skeleton sequences for 3d action recognition", arXiv preprint arXiv:1703.03492 (2017).
28. Shahroudy, A., Ng, T.T., Yang, Q., and Wang, G. "Multimodal multipart learning for action recognition in depth videos", IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(10), pp. 2123-2129 (2016).
29. Huynh, L., Ho, T., Tran, Q., Dinh, T.B., and Dinh, T. "Robust classification of human actions from 3d data", In Signal Processing and Information Technology (ISSPIT), 2012 IEEE International Symposium on, pp. 263-268 (2012).
30. Luvizon, D.C., Tabia, H., and Picard, D. "Learning features combination for human action recognition from skeleton sequences", Pattern Recognition Letters (2017).
31. Amor, B.B., Su, J., and Srivastava, A. "Action recognition using rateinvariant analysis of skeletal shape trajectories", IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1), pp. 1-13 (2016).
32. Papadopoulos, G.T., Axenopoulos, A., and Daras, P. "Real-time skeleton-tracking-based human action recognition using kinect data", In MMM, 1, pp. 473- 483 (2014).
33. Fisher, R.A. "The use of multiple measures in taxonomic problems", Annals of Eugenics, 7, pp. 179-188 (1936).
34. Hastie, T., Tibshirani, R., and Friedman, J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edn., Springer, New York (2009).
35. Bishop, C., Pattern Recognition and Machine Learning, Springer, New York (2007).
36. Theodoridis, S. and Koutroumbas, K., Pattern Recognition, 2nd Edn., Elsevier Academic Press, USA (2003).
37. Rabiner, L.R. "A tutorial on hidden markov models and selected applications in speech recognition", In Proceedings of the IEEE, 77(2), pp. 257-286 (1989).
38. Tst fall detection dataset. https://ieeedataport. org/documents/tst-fall-detection-dataset-v2, Accessed:July 15, 2017.
39. Gasparrini, S., Cippitelli, E., Gambi, E., Spinsante, S., Wahsleen, J., Orhan, I., and Lindh, T. "Proposal and experimental evaluation of fall detection solution based on wearable and depth data fusion", In ICT Innovations, Springer, pp. 99-108 (2016).
40. Aggarwal, J.K. and Xia, L. "Human activity recognition from 3D data: A review", Pattern Recognition Letters, 48, pp. 70-80 (2014).
41. Han, J., Shao, L., Xu, D., and Shotton, J. "Enhanced computer vision with microsoft kinect sensor: A review", IEEE Transactions on Cybernetics, 43(5), pp. 1318-1334 (2013).
42. Chen, L., Wei, H., and Ferryman, J. "A survey of human motion analysis using depth imagery", Pattern Recognition Letters, 34(15), pp. 1995-2006 (2013).
43. Ye, M., Zhang, Q., Wang, L., Zhu, J., Yang, R., and Gall, J. "A survey on human motion analysis from depth data", In Time-of-Flight and Depth Imaging. Sensors, Algorithms, and Applications, Springer, pp. 149-187 (2013).
44. Wang, J., Liu, Z., and Wu, Y. "Learning actionlet ensemble for 3d human action recognition", In Human Action Recognition with Depth Cameras, Springer, pp. 11-40 (2014).
45. Althloothi, S., Mahoor, M.H., Zhang, X., and Voyles, R.M. "Human activity recognition using multi-features and multiple kernel learning", Pattern Recognition, 47(5), pp. 1800-1812 (2014).
46. Theodorakopoulos, I., Kastaniotis, D., Economou, G., and Fotopoulos, S. "Pose-based human action recognition via sparse representation in dissimilarity space", Journal of Visual Communication and Image Representation, 25(1), pp. 12-23 (2014).
47. Kapsouras, I. and Nikolaidis, N. "Action recognition on motion capture data using a dynemes and forward differences representation", Journal of Visual Communication and Image Representation, 25(6), pp. 1432- 1445 (2014).
48. Liu, A.A., Nie, W.Z., Su, Y.T., Ma, L., Hao, T., and Yang, Z.X. "Coupled hidden conditional random fields for rgb-d human action recognition", Signal Processing, 112, pp. 74-82 (2015).
49. Lee, S., Le, H.X., Ngo, H.Q., Kim, H.I., Han, M., Lee, Y.K., et al. "Semi-markov conditional random fields for accelerometer-based activity recognition", Applied Intelligence, 35(2), pp. 226-241 (2011).