Home   >   CSC-OpenAccess Library   >    Manuscript Information
Semantic Concept Detection in Video Using Hybrid Model of CNN and SVM Classifiers
Nita Sanjay Patil, Sudhir D. Sawarkar
Pages - 13 - 28     |    Revised - 28-02-2019     |    Published - 01-04-2019
Volume - 13   Issue - 2    |    Publication Date - April 2019  Table of Contents
Semantic Concept Detection, SVM, CNN, Multi-label Classification, Deep Features, Imbalanced Dataset.
In today's era of digitization and fast internet, many video are uploaded on websites, a mechanism is required to access this video accurately and efficiently. Semantic concept detection achieve this task accurately and is used in many application like multimedia annotation, video summarization, annotation, indexing and retrieval. Video retrieval based on semantic concept is efficient and challenging research area. Semantic concept detection bridges the semantic gap between low level extraction of features from key-frame or shot of video and high level interpretation of the same as semantics. Semantic Concept detection automatically assigns labels to video from predefined vocabulary. This task is considered as supervised machine learning problem. Support vector machine (SVM) emerged as default classifier choice for this task. But recently Deep Convolutional Neural Network (CNN) has shown exceptional performance in this area. CNN requires large dataset for training. In this paper, we present framework for semantic concept detection using hybrid model of SVM and CNN. Global features like color moment, HSV histogram, wavelet transform, grey level co-occurrence matrix and edge orientation histogram are selected as low level features extracted from annotated groundtruth video dataset of TRECVID. In second pipeline, deep features are extracted using pretrained CNN. Dataset is partitioned in three segments to deal with data imbalance issue. Two classifiers are separately trained on all segments and fusion of scores is performed to detect the concepts in test dataset. The system performance is evaluated using Mean Average Precision for multi-label dataset. The performance of the proposed framework using hybrid model of SVM and CNN is comparable to existing approaches.
1 Google Scholar 
2 Academia 
3 refSeek 
4 Doc Player 
5 Scribd 
6 SlideShare 
A. Karpathy, G. Toderici, S. Shetty, T. Leung, R. Sukthankar, and L. Fei-Fei. "Large-scale Video Classification with Convolutional Neural Networks," 2013.
A. Krizhevsky, I. Sutskever, and G. E. Hinton. "ImageNet Classification with Deep Convolutional Neural Networks," In NIPS, 2012
A. Podlesnaya and S. Podlesnyy. "Deep Learning Based Semantic Video Indexing and Retrieval," no. 2214.
A. Ulges, C. Schulze, M. Koch, and T. M. Breuel. "Learning automatic concept detectors from online video," In Comput. Vis. Image Underst., vol. 114, no. 4, pp. 429-438, 2010.
B. Safadi, N. Derbas, A. Hamadi, M. Budnik, P. Mulhem, and G. Qu, "LIG at TRECVid 2014 : Semantic Indexing LIG at TRECVid 2014 : Semantic Indexing," no. June 2015, 2014.
B. Safadi, N. Derbas, A. Hamadi, M. Budnik, P. Mulhem, and G. Qu. "LIG at TRECVid 2014 : Semantic Indexing tion of the semantic indexing," 2014.
D. Le. "A Comprehensive Study of Feature Representations for Semantic Concept Detection,"Fifth IEEE International Conference on Semantic Computing, pp. 235-238, 2011.
D. Tran, L. D. Bourdev, R. Fergus, L. Torresani, and M. Paluri, "Learning Spatiotemporal Features with 3D Convoutional Networks," CoRR, vol. abs/1412.0, 2015.
F. Markatopoulou, "Deep Multi-task Learning with Label Correlation Constraint for Video Concept Detection," pp. 501-505.
F. Markatopoulou, N. Pittaras, O. Papadopoulou, V. Mezaris, and I. Patras. "A Study on the Use of a Binary Local Descriptor and Color Extensions of Local Descriptors for Video Concept Detection," vol. 8935, pp. 282-293, 2015.
F. Markatopoulou, V. Mezaris, and I. Patras. "Cascade of classifiers based on binary, non-binary and deep convolutional network descriptors for video concept detection.,"In IEEE Int. Conf. onImage Processing (ICIP 2015), Canada, 2015.
F. Markatopoulou, V. Mezaris, N. Pittaras, and I. Patras. "Local Features and a Two-Layer Stacking Architecture for Semantic Concept Detection in Video," IEEE Trans. Emerg. Top. Comput., vol. 3, no. 2, pp. 193-204, 2015.
F.Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. Goodfellow, A. Bergeron, N. Bouchard and Y. Bengio. "Theano: new features and speed improvements. Deep Learning and UnsupervisedFeature Learning,". NIPS Workshop, pp. 1-10,2012.
H. Ha, Y. Yang, and S. Pouyanfar. "Correlation-based Deep Learning for Multimedia Semantic Concept Detection." " In IEEE International Symposium on Multimedia (ISM08), pp. 316-321,Dec 2008.
H. Tian and S.-C. Chen. "MCA-NN: Multiple Correspondence Analysis Based Neural Network for Disaster Information Detection," In IEEE Third Int. Conf. Multimed. Big Data, pp. 268-275, 2017.
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng and T. Darrell. "DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition,"CoRR, abs/1310.1531, vol. 32, 2013.
K. Simonyan and A. Zisserman. "Two-Stream Convolutional Networks for Action Recognition in Videos," pp. 1-9, 2014.
L. Feng and B. Bhanu. "Semantic Concept Co-Occurrence Patterns for Image Annotation and Retrieval," IEEE Trans. Pattern Anal. Mach. Intell., vol. 38, no. 4, pp. 785-799, 2016.
N. Inoue, Z. Liang, M. Lin, Z. Xuefeng, K. Ueki. "TokyoTech-Waseda at TRECVID 2014", 2014
N. J. Janwe and K. K. Bhoyar. "Multi-label semantic concept detection in videos using fusion of asymmetrically trained deep convolutional neural networks and foreground driven concept co-occurrence matrix," In Appl. Intell., vol. 48, no. 8, pp. 2047-2066, 2018.
R. Girshick, J. Donahue, T. Darrell, U. C. Berkeley, and J. Malik. "Rich feature hierarchies for accurate object detection and semantic segmentation Tech report," 2012.
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen,C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. Weiss, and K. Wilson. "CNN architectures for large-scale audioclassification," in International Conference on Acoustics, Speech andSignal Processing (ICASSP), 2017.
S. T. Strat, A. Benoit, P. Lambert, and A. Caplier. "Retina-Enhanced SURF Descriptors for Semantic Concept Detection in Videos," 2012.
T. S. Chua, J. Tang, R. Hong, H. Li, Z. Luo, and Y. Zheng, "NUS-wide: A real-world web image database from national university of singapore," ACM Int. Conf. Image Video Retr., p. 48, 2009.
U. Niaz, B. Merialdo, C. Tanase, M. Eskevich, B. Huet, and S. Antipolis. "EURECOM at TrecVid 2015 : Semantic Indexing and Video Hyperlinking Tasks," 2015.
X. Wang and A. Gupta, "Unsupervised Learning of Visual Representations using Videos." In ICCV, 2015.
Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick,S. Guadarrama, and T. Darrell. "Caffe: Convolutional architecture forfast feature embedding," arXiv:1408.5093, 2014.
Y. Sun, K. Sudo, Y. Taniguchi, H. Li, Y. Guan, and L. Liu. "TRECVid 2013 Semantic Video Concept Detection by NTT-MD-DUT," In Sun2013TrecVid2s, 2013.
Z. Xu, Y. Yang, and A. G. Hauptmann Itee. "A Discriminative CNN Video Representation for Event Detection.". In Proceedings ofthe IEEE Conference on Computer Vision and PatternRecognition, 2015.
Mr. Nita Sanjay Patil
Datta Meghe College of Engineering Airoli, Navi Mumbai - India
Mr. Sudhir D. Sawarkar
Datta Meghe College of Engineering Airoli, Navi Mumbai - India