Home   >   CSC-OpenAccess Library   >    Manuscript Information
Full Text Available

This is an Open Access publication published under CSC-OpenAccess Policy.
Publications from CSC-OpenAccess Library are being accessed from over 74 countries worldwide.
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classifiers
Nita Sanjay Patil, Sudhir D. Sawarkar
Pages - 13 - 28     |    Revised - 28-02-2019     |    Published - 01-04-2019
Volume - 13   Issue - 2    |    Publication Date - April 2019  Table of Contents
Semantic Concept Detection, SVM, CNN, Multi-label Classification, Deep Features, Imbalanced Dataset.
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
1 Google Scholar 
2 Academia 
3 refSeek 
4 Doc Player 
5 Scribd 
6 SlideShare 
1 Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell. "Caffe: Convolutional architecture forfast feature embedding," arXiv:1408.5093, 2014.
2 F.Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard and Y. Bengio. "Theano: new features and speed improvements. Deep Learning and UnsupervisedFeature Learning,". NIPS Workshop, pp. 1-10,2012.
3 A. Krizhevsky, I. Sutskever, and G. E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks," In NIPS, 2012
4 J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng and T. Darrell. "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition,"CoRR, abs/1310.1531, vol. 32, 2013.
5 R. Girshick, J. Donahue, T. Darrell, U. C. Berkeley, and J. Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation Tech report," 2012.
6 A. Ulges, C. Schulze, M. Koch, and T. M. Breuel. "Learning automatic concept detectors from online video," In Comput. Vis. Image Underst., vol. 114, no. 4, pp. 429-438, 2010.
7 B. Safadi, N. Derbas, A. Hamadi, M. Budnik, P. Mulhem, and G. Qu. "LIG at TRECVid 2014 : Semantic Indexing tion of the semantic indexing," 2014.
8 U. Niaz, B. Merialdo, C. Tanase, M. Eskevich, B. Huet, and S. Antipolis. "EURECOM at TrecVid 2015 : Semantic Indexing and Video Hyperlinking Tasks," 2015.
9 F. Markatopoulou, N. Pittaras, O. Papadopoulou, V. Mezaris, and I. Patras. "A Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection," vol. 8935, pp. 282-293, 2015.
10 L. Feng and B. Bhanu. "Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 4, pp. 785-799, 2016.
11 S. T. Strat, A. Benoit, P. Lambert, and A. Caplier. "Retina-Enhanced SURF Descriptors for Semantic Concept Detection in Videos," 2012.
12 F. Markatopoulou, V. Mezaris, N. Pittaras, and I. Patras. "Local Features and a Two-Layer Stacking Architecture for Semantic Concept Detection in Video," IEEE Trans. Emerg. Top. Comput., vol. 3, no. 2, pp. 193-204, 2015.
13 D. Le. "A Comprehensive Study of Feature Representations for Semantic Concept Detection,"Fifth IEEE International Conference on Semantic Computing, pp. 235-238, 2011.
14 A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. "Large-scale Video Classification with Convolutional Neural Networks," 2013.
15 K. Simonyan and A. Zisserman. "Two-Stream Convolutional Networks for Action Recognition in Videos," pp. 1-9, 2014.
16 D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatiotemporal Features with 3D Convoutional Networks," CoRR, vol. abs/1412.0, 2015.
17 X. Wang and A. Gupta, "Unsupervised Learning of Visual Representations using Videos." In ICCV, 2015.
18 Y. Sun, K. Sudo, Y. Taniguchi, H. Li, Y. Guan, and L. Liu. "TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT," In Sun2013TrecVid2s, 2013.
19 H. Ha, Y. Yang, and S. Pouyanfar. "Correlation-based Deep Learning for Multimedia Semantic Concept Detection." " In IEEE International Symposium on Multimedia (ISM08), pp. 316-321,Dec 2008.
20 H. Tian and S.-C. Chen. "MCA-NN: Multiple Correspondence Analysis Based Neural Network for Disaster Information Detection," In IEEE Third Int. Conf. Multimed. Big Data, pp. 268-275, 2017.
21 T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, "NUS-wide: A real-world web image database from national university of singapore," ACM Int. Conf. Image Video Retr., p. 48, 2009.
22 Z. Xu, Y. Yang, and A. G. Hauptmann Itee. "A Discriminative CNN Video Representation for Event Detection.". In Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition, 2015.
23 A. Podlesnaya and S. Podlesnyy. "Deep Learning Based Semantic Video Indexing and Retrieval," no. 2214.
24 N. J. Janwe and K. K. Bhoyar. "Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix," In Appl. Intell., vol. 48, no. 8, pp. 2047-2066, 2018.
25 F. Markatopoulou, V. Mezaris, and I. Patras. "Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection.,"In IEEE Int. Conf. onImage Processing (ICIP 2015), Canada, 2015.
26 B. Safadi, N. Derbas, A. Hamadi, M. Budnik, P. Mulhem, and G. Qu, "LIG at TRECVid 2014 : Semantic Indexing LIG at TRECVid 2014 : Semantic Indexing," no. June 2015, 2014.
27 F. Markatopoulou, "Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection," pp. 501-505.
28 N. Inoue, Z. Liang, M. Lin, Z. Xuefeng, K. Ueki. "TokyoTech-Waseda at TRECVID 2014", 2014
29 S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen,C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. Weiss, and K. Wilson. "CNN architectures for large-scale audioclassification," in International Conference on Acoustics, Speech andSignal Processing (ICASSP), 2017.
Mr. Nita Sanjay Patil
Datta Meghe College of Engineering Airoli, Navi Mumbai - India
Mr. Sudhir D. Sawarkar
Datta Meghe College of Engineering Airoli, Navi Mumbai - India