Shallow vs. Deep Image Representations: A Comparative Study with Enhancements Applied For The Problem of Generic Object Recognition
Yasser Mohammed Abdullah, Mussa M. Ahmed
Pages - 78 - 102     |    Revised - 30-11-2019     |    Published - 31-12-2019
Volume - 8   Issue - 4    |    Publication Date - December 2019  Table of Contents
Shallow Models, Deep Learning Models, Encoding Methods, Object Recognition, BoVW.
The traditional approach for solving the object recognition problem requires image representations to be first extracted and then fed to a learning model such as an SVM. These representations are handcrafted and heavily engineered by running the object image through a sequence of pipeline steps which requires a good prior knowledge of the problem domain in order to engineer these representations. Moreover, since the classification is done in a separate step, the resultant handcrafted representations are not tuned by the learning model which prevents it from learning complex representations that might would give it more discriminative power. However, in end-to-end deep learning models, image representations along with the classification decision boundary are all learnt directly from the raw data requiring no prior knowledge of the problem domain. These models deeply learn the object image representation hierarchically in multiple layers corresponding to multiple levels of abstraction resulting in representations that are more discriminative and give better results on challenging benchmarks. In contrast to the traditional handcrafted representations, the performance of deep representations improves with the introduction of more data, and more learning layers (more depth) and they perform well on large-scale machine learning problems. The purpose of this study is six fold: (1) review the literature of the pipeline processes used in the previous state-of-the-art codebook model approach for tackling the problem of generic object recognition, (2) Introduce several enhancements in the local feature extraction and normalization steps of the recognition pipeline, (3) compare the enhancements proposed to different encoding methods and contrast them to previous results, (4) experiment with current state-of-the-art deep model architectures used for object recognition, (5) compare between deep representations extracted from the deep learning model and shallow representations handcrafted through the recognition pipeline, and finally, (6) improve the results further by combining multiple different deep learning models into an ensemble and taking the maximum posterior probability.
Mr. Yasser Mohammed Abdullah
Faculty of Engineering/Department of IT, Aden University, Aden, Yemen - Yemen
Mr. Mussa M. Ahmed
Faculty of Engineering/Department of ECE,Aden University, Aden, Yemen - Yemen

