Scale Invariant Mask R-CNN for Pedestrian Detection

##plugins.themes.bootstrap3.article.main##

Ujwalla H. Gawande
Kamal O. Hajari
Yogesh G. Golhar

Pedestrian detection is a challenging and active research area in computer vision. Recognizing pedestrianshelps in various utility applications such as event detection in overcrowded areas, gender, and gaitclassification, etc. In this domain, the most recent research is based on instance segmentation using MaskR-CNN. Most of the pedestrian detection method uses a feature of different body portions for identifying aperson. This feature-based approach is not efficient enough to differentiate pedestrians in real-time, wherethe background changing. In this paper, a combined approach of scale-invariant feature map generationfor detecting a small pedestrian and Mask R-CNN has been proposed for multiple pedestrian detection toovercome this drawback. The new database was created by recording the behavior of the student at theprominent places of the engineering institute. This database is comparatively new for pedestrian detectionin the academic environment. The proposed Scale-invariant Mask R-CNN has been tested on the newlycreated database and has been compared with the Caltech [1], INRIA [2], MS COCO [3], ETH [4], andKITTI [5] database. The experimental result shows significant performance improvement in pedestrian detection as compared to the existing approaches of pedestrian detection and instance segmentation. Finally, we conclude and investigate the directions for future research.

Keywords

Convolutional neural network, Instance segmentation, Pedestrian Detection, Mask R-CNN

##plugins.themes.bootstrap3.article.details##

How to Cite
Gawande, Ujwalla H.; Hajari, Kamal O.; Golhar, Yogesh G. “Scale Invariant Mask R-CNN for Pedestrian Detection”. ELCVIA: electronic letters on computer vision and image analysis, 2020, Vol. 19, Num. 3, pp. 98-118, https://raco.cat/index.php/ELCVIA/article/view/375823.

References

Piotr Dollar, Christian Wojek, Bernt Schiele, and Pietro Perona. Pedestrian detection: An evaluation of the
state of the art, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 34(4):743-761,
2012. DOI: https://doi.org/10.1109/TPAMI.2011.155

Navneet Dalal and Bill Triggs. Histograms of Oriented Gradients for Human Detection, IEEE Conference
on Computer Vision and Pattern Recognition (CVPR), pp. 886-893, San Diego, CA, USA, 20th-25th June
2005. DOI: https://doi.org/10.1109/CVPR.2005.177

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollar and C. Lawrence Zitnick. Microsoft COCO: Common Objects in Context, 13th European Conferenceon Computer Vision (ECCV), Springer, Zurich, Switzerland, pp. 1-15, 6th-12th September 2014. DOI:
https://doi.org/10.1007/978-3-319-10602-1 48

Andreas Ess, Bastian Leibe, and Luc Van Gool. Depth and appearance for mobile scene analysis, IEEE
International Conference on Computer Vision (ICCV), pp. 1-8, Venice, Italy, 22nd-29th October 2017.
DOI: https://doi.org/10.1109/iccv.2007.4409092

Andreas Geiger, Philip Lenz, and Raquel Urtasun. Are we ready for autonomous driving? the KITTI
vision benchmark suite, International Conference on Computer Vision and Pattern Recognition (CVPR),
pp. 3354-3361, RI, United States, 16th-21st June 2012. DOI: https://doi.org/10.1109/CVPR.2012.6248074

Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick. Mask R-CNN, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 42(2):386-397, Feb. 2020. DOI:
https://doi.org/10.1109/TPAMI.2018.2844175

Kaiming He, Georgia Gkioxari, Piotr Dollar and Ross Girshick. Mask R-CNN, IEEE International
Conference on Computer Vision (ICCV), pp. 2980-2988, Venice, Italy, 22nd-29th October 2017. DOI:
https://doi.org/10.1109/iccv.2017.322

Joseph Redmon, Santosh Divvala, Ross Girshick and Ali Farhadi. You Only Look Once: Unified, RealTime Object Detection, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-10,
Las Vegas, Nevada, USA, 26th June-1st July 2016. DOI: https://doi.org/10.1109/CVPR.2016.91

Jifeng Dai, Yi Li, Kaiming He and Jian Sun. R-FCN: Object Detection via Region-based Fully Convolutional Networks, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 1-11, Las
Vegas, Nevada, USA, 26th June-1st July 2016.

Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu and Alexander
C. Berg. SSD: Single Shot MultiBox Detector, 14th European Conference on Computer Vision (ECCV), pp.
1-17, Amsterdam, Netherlands, 11th-14th October 2016. DOI: https://doi.org/10.1007/978-3-319-46448-
0 2

Jonathan Long Anguelov, Evan Shelhamer and Trevor Darrell. Fully Convolutional Networks for Semantic Segmentation, Computer Vision and Pattern Recognition (CVPR), pp. 1-10, Boston, Massachusetts,
8th-10th June 2015. DOI: https://doi.org/10.1109/CVPR.2015.7298965

Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks, IEEE Transactions on Pattern Analysis and Machine Intelligence
(TPAMI), 39(6):1137-1149, June 2017. DOI: https://doi.org/10.1109/TPAMI.2016.2577031

Shaoqing Ren, Kaiming He, Ross Girshick and Jian Sun. Faster R-CNN: Towards real-time object detection with region proposal networks, Neural Information Processing Systems (NIPS), Montreal, pp. 1-9,
Quebec, Canada, 7th-12th December 2015.

Ross Girshick. Fast R-CNN, International Conference on Computer Vision (ICCV), pp. 1441-1448, Santiago, Chile 7th-13th December 2015. DOI: https://doi.org/10.1109/ICCV.2015.169

Alec Radford, Luke Metz and Soumith Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks, Computer Vision and Pattern Recognition (CVPR), pp. 1-10,
Boston, Massachusetts, 8th-10th June 2015.

K. He, Xiangyu Zhang, Shaoqing Ren and Jian Sun. Deep Residual Learning for Image Recognition,
Computer Vision and Pattern Recognition (CVPR), pp. 1-10, Boston, Massachusetts, 8th-10th June 2015.
DOI: https://doi.org/10.1109/CVPR.2016.90

Christian Szegedy, Sergey Ioffe and Vincent Vanhoucke, Alex Alemi. Inception-v4, Inception-ResNet and
the Impact of Residual Connections on Learning, Computer Vision and Pattern Recognition (CVPR), pp.
1-10, Las Vegas, Nevada, USA, 26th June-1st July 2016.

Karen Simonyan and Andrew Zisserman. Very Deep Convolutional Networks for Large-Scale Image
Recognition, Computer Vision and Pattern Recognition (CVPR), pp. 1-10, Boston, Massachusetts, 8th-10th
June 2015.

Matthew D Zeiler and Rob Fergus. Visualizing and Understanding Convolutional Networks, Computer
Vision and Pattern Recognition (CVPR), pp. 1-11, Portland, Oregon, USA, 23rd-28th June 2013. DOI:
https://doi.org/10.1007/978-3-319-10590-1 53

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. ImageNet Classification with Deep
Convolutional Neural Networks, 25th International Conference on Neural Information Processing
Systems (NIPS), pp. 1-9, Lake Tahoe, Nevada, United States, 3rd-6th December 2012. DOI:
https://doi.org/10.1145/3065386

Liu Kangming. Research on an improved pedestrian detection method based on Deep Belief Network
(DBN) classification algorithm, Journal of Information Systems and Technologies (RISTI), 17(3):77-87,
March 2016.

Y.Lecun, L. Bottou, Y. Bengio and P. Haffner. Gradient-based learning applied to document recognition,
Proceedings of the IEEE, 86(11): 2278-2324, Nov. 1998. DOI: https://doi.org/10.1109/5.726791

Seonghoon Kang, Hyeran Byun and Seong-Whan Lee. Real-Time Pedestrian Detection Using Support Vector Machines, First International Workshop on SVM: Pattern Recognition with
Support Vector Machines, pp 268-277, Niagara Falls, Canada, 10th August 2002. DOI:
https://doi.org/10.1142/S0218001403002435

David Geronimo, Angel D. Sappa, Antonio Lopez and Daniel Ponsa. Pedestrian detection using AdaBoost
learning of features and vehicle pitch estimation, International Conference on Visualization, Imaging, and
Image Processing, pp. 1-8, Spain, 28th-30th August 2006.

C.Wu, J. Yue, L.Wang and F. Lyu. Detection and Classification of Recessive Weakness in Superbuck
Converter Based on WPD-PCA and Probabilistic Neural Network, MDPI Electronics, 8(290):1-17, March
2019. DOI: https://doi.org/10.3390/electronics8030290

Asvadi, Alireza, Karami-Mollaie, Mohammad Reza, Baleghi, Yasser, Seyyedi Andi, and Hosein. Improved Object Tracking Using Radial Basis Function Neural Networks, 7th Iranian Conference on Machine Vision and Image Processing (MVIP), Tehran, Iran, pp. 1-5, Nov 16th-17th Nov 2011. DOI:
https://doi.org/10.1109/IranianMVIP.2011.6121604

Neagoe, Victor Emil, Tudoran, Cristian, Neghina, and Mihai. A neural network approach to pedestrian
detection, 13th WSEAS International Conference on COMPUTERS (ICCOMP), pp. 374-379, Wisconsin,
United States, 23rd July 2009.

Juncheng Wang andGuiying Li. Accelerate proposal generation in R-CNN methods for fast pedestrian
extraction, The Electronic Library, Emerald, 37 (3): 1-19, May 2019. DOI: https://doi.org/10.1108/EL-09-
2018-0191

Bruno Artacho, Andreas Savakis. Waterfall Atrous Spatial Pooling Architecture for Efficient Semantic
Segmentation, Sensors, MDPI, 19, (24): 1-17, 2019. DOI: https://doi.org/10.3390/s19245361

Yunchao Gong, Liwei Wang, Ruiqi Guo, and Svetlana Lazebnik. Multiscale orderless pooling of deep
convolutional activation features, 13th European Conference on Computer Vision (ECCV), pp. 392-407,
Zurich, Switzerland, 6th-12th September 2014. DOI: https://doi.org/10.1007/978-3-319-10584-0 26

Jianan Li, Xiaodan Liang, Shengmei Shen, Tingfa Xu, Jiashi Feng, Shuicheng Yan. Scale-aware Fast
R-CNN for Pedestrian Detection, IEEE Transaction on Multimedia, 20(4):985-996, April 2018. DOI:
https://doi.org/10.1109/TMM.2017.2759508

Kelong Wang and Wei Zhou. Pedestrian and cyclist detection based on deep neural network fast
R-CNN, International Journal of Advanced Robotic Systems, SAGE, 16(2):1-10, April 2019. DOI:
https://doi.org/10.1177/1729881419829651

MiranPobar and Marina Ivasic-Kosm. Mask R-CNN and Optical flow-based method for detection and
marking of handball actions, 11th International Congress on Image and Signal Processing, BioMedical
Engineering and Informatics (CISP-BMEI 2018), pp. 1-6, Beijing, China, 13th-15th October 2018. DOI:
https://doi.org/10.1109/CISP-BMEI.2018.8633201

AsatiMinkesh, Kraisittipong Worranitta and Miyachi Taizo. Human extraction and scene transition utilizing Mask R-CNN, International Conference on Computer Vision and Pattern Recognition (CVPR), pp.
1-6, California, United States, 16th-20th June 2019.

Gawande, Ujwalla and Hajari, Kamal and Golhar, Yogesh. Pedestrian Detection and Tracking in Video
Surveillance System: Issues, Comprehensive Review, and Challenges, Recent Trends in Computational
Intelligence, IntechOpen,1-24, April 2020. DOI: https://doi.org/10.5772/intechopen.90810

Ujwalla Gawande, Kamal Hajari and Yogesh Golhar. Deep Learning Approach to Key Frame Detection
in Human Action Videos, Recent Trends in Computational Intelligence, IntechOpen,1-17, Feb 2020. DOI:
https://doi.org/10.5772/intechopen.91188

Xiaoyu Wang, Tony X. Han and Shuicheng Yan. An HOG-LBP human detector with partial occlusion
handling, IEEE 12th International Conference on Computer Vision (ICCV), pp. 32-39, Kyoto, Japan, 29th
September-2nd October 2009. DOI: https://doi.org/10.1109/ICCV.2009.5459207

Piotr Dollar, Zhuowen Tu, Pietro Perona, and Serge Belongie. Integral channel features, British
Machine Vision Conference (BMVC), London, UK, pp. 1-11, 7th-10th September 2009. DOI:
https://doi.org/10.5244/C.23.91

Ujwalla Gawande, Yogesh Golhar. Biometric security system: a rigorous review of unimodal and multimodal biometrics techniques, International Journal of Biometrics (IJBM), InderScience, 10(2):142-175,
April 2018. DOI: https://doi.org/10.1504/IJBM.2018.10012749

U. Gawande, M. Zaveri and A. Kapur. Bimodal biometric system: feature level fusion of iris and fingerprint, 14th European Conference on Computer Vision (ECCV), ScienceDirect, Elsevier, 2013(2): 7-8, Feb.
2013. DOI: https://doi.org/10.1016/S0969-4765(13)70035-3

Nathan Silberman, David Sontag, Rob Fergus. Instance Segmentation of Indoor Scenes Using a Coverage
Loss, 13th European Conference on Computer Vision (ECCV), Zurich, Switzerland, 6th-12th September
2014. DOI: https://doi.org/10.1007/978-3-319-10590-1 40

Ziyu Zhang, Sanja Fidler, Raquel Urtasun. Instance-Level Segmentation for Autonomous Driving with
Deep Densely Connected MRFs, Computer Vision and Pattern Recognition (CVPR), pp. 1-10, Boston,
Massachusetts, 8th-10th June 2015. DOI: https://doi.org/10.1109/CVPR.2016.79

Xiaogang Wang, Meng Wang, and Wei Li. Scene-specific pedestrian detection for static video surveillance, IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 36(2):361-374, 2014.
DOI: https://doi.org/10.1109/TPAMI.2013.124

Yonglong Tian, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Pedestrian detection aided by deep learning
semantic tasks, Computer Vision and Pattern Recognition (CVPR), pp. 1-10, Boston, Massachusetts, 8th10th June 2015. DOI: https://doi.org/10.1109/CVPR.2015.7299143

Shanshan Zhang, Rodrigo Benenson, and Bernt Schiele. Filtered channel features for pedestrian detection,
Computer Vision and Pattern Recognition (CVPR), pp. 1751-1760, Boston, Massachusetts, 8th-10th June
2015. DOI: https://doi.org/10.1109/CVPR.2015.7298784

Mohammad Saberian Zhaowei Cai and Nuno Vasconcelos. Learning complexity-aware cascades for deep
pedestrian detection, International Conference on Computer Vision (ICCV), pp. 1-10, Santiago, Chile 7th13th December 2015. DOI: https://doi.org/10.1109/ICCV.2015.384

Sakrapee Paisitkriangkrai, Chunhua Shen, and Anton van den Hengel. Strengthening the effectiveness of pedestrian detection with spatially pooled features, 13th European Conference on Computer Vision (ECCV), Springer, Zurich, Switzerland, pp. 546-561, 6th-12th September 2014. DOI: https://doi.org/10.1007/978-3-319-10593-2 36