Ghorayeb, Hicham (2007) Conception et mise en oeuvre d'algorithmes de vision temps-réel pour la vidéo surveillance intelligente. PhD thesis Informatique temps réel, robotique et automatique, CAOR- Centre de robotique, ENSMP p.197.
Full text available as:
|
|
Abstract
In this dissertation, we present our research work held at the Center of Robotics (CAOR) of the Ecole des Mines de Paris which tackles the problem of intelligent video analysis.
The primary objective of our research is to prototype a generic framework for intelligent video analysis. We optimized this framework and configured it to cope with specific application requirements. We consider a people tracker application extracted from the PUVAME project. This application aims to improve people security in urban zones near to bus stations.
Then, we have improved the generic framework for video analysis mainly for background subtraction and visual object detection. We have developed a library for machine learning specialized in boosting for visual object detection called LibAdaBoost.
To the best of our knowledge LibAdaBoost is the first library in its kind. We make LibAdaBoost available for the machine learning community under the LGPL license.
Finally we wanted to adapt the visual object detection algorithm based on boosting so that it could run on the graphics hardware. To the best of our knowledge we were the first to implement visual object detection with sliding technique on the graphics hardware. The results were promising and the prototype performed three to nine times better than the CPU.
The framework was successfully implemented and integrated to the RTMaps environment.
It was evaluated at the final session of the project PUVAME and demonstrated its fiability over various test scenarios elaborated specifically for the PUVAME project.
| Item Type: | PhD Thesis (PhD) |
|---|---|
| PhD Supervisor: | Laurgeau, Claude |
| Date: | 12 September 2007 |
| Board of examiners: | Meyrueis, Patrick and Siarry, Patrick and Akil, Mohamed and Steux, Bruno and Laurgeau, Claude and Meyer, Fernand and Schwerdt, Karl |
| Discipline: | Informatique temps réel, robotique et automatique |
| Collection (Fonds): | Mines ParisTech (ENSMP) |
| Institution: | ENSMP |
| Department: | CAOR- Centre de robotique |
| Subjects: | 1. Mathematics and Applications |
| Uncontrolled Keywords: | Vidéo surveillance, Boosting, Reconnaissance automatique des formes, Système de transport intelligent, Apprentissage automatique, Détection objet en mouvement, méthode Monte Carlo |
| ID Code: | 3064 |
| Deposited By: | Claudine Abauzit |
| Deposited On: | 05 November 2007 |
References
[Abr06] Y. Abramson. Pedestrian detection for intelligent transportation systems.
EMP Press, 2006.
[Ame03] A. Amer. Voting-based simultaneous tracking of multiple video objects.
In Proc. SPIE Int. Symposium on Electronic Imaging, pages 500–511,
2003.
[ASG05] Y. Abramson, B. Steux, and H. Ghorayeb. Yef real-time object detection.
In ALART’05:International workshop on Automatic Learning and
Real-Time, pages 5–13, 2005.
[BA04] J. Bobruk and D. Austin. Laser motion detection and hypothesis tracking
from a mobile platform. In Australasian Conference on Robotics and
Automation (ACRA), 2004.
[BER03] J. Black, T. Ellis, and P. Rosin. A novel method for video tracking
performance evaluation. In International Workshop on Visual Surveillance
and Performance Evaluation of Tracking and Surveillance, pages
125–132, 2003.
[BFH04a] I. Buck, K. Fatahalian, and Hanrahan. Gpubench: Evaluating gpu performance
for numerical and scientific applications. Proceedings of the
2004 ACM Workshop on General-Purpose Computing on Graphics Processors,
pages C–20, Aug 2004.
[BFH+04b] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Mike, and H. Pat. Brook
for gpus: Stream computing on graphics hardware, 2004.
[BMGE01] T. Boult, R. Micheals, X. Gao, and M. Eckmann. Into the woods:
visual surveillance of noncooperative and camouflaged targets in complex
outdoor settings, 2001.
[BS03] H. Ghorayeb B. Steux, Y. Abramson. Camellia image processing library,
2003.
[CG00] C.Stauffer and W.E.L. Grimson. Adaptive background mixture models
for real-time tracking. IEEE Transactions on pattern analysis and
machine intelligence, pages 747–757, August 2000.
[CL04] B. Chen and Y. Lei. Indoor and outdoor people detection and shadow
suppression by exploiting hsv color information. cit, 00:137–142, 2004.
[Cro84] F. C. Crow. Summed-area tables for texture mapping. In SIGGRAPH
’84: Proceedings of the 11th annual conference on Computer graphics
and interactive techniques, pages 207–212, New York, NY, USA, 1984.
ACM Press.
[CSW03] H. Cramer, U. Scheunert, and G.Wanielik. Multi sensor fusion for object
detection using generalized feature models. In International Conference
on Information Fusion, 2003.
[Ded04] Y Dedeoglu. Moving object detection, tracking and classification for
smart video surveillance, 2004.
[DHS00] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification
(2nd Edition). Wiley-Interscience, 2000.
[DM00] D. Doermann and D. Mihalcik. Tools and techniques for video performance
evaluation. Proceedings of the International Conference on
Pattern Recognition (ICPR00), pages 4167–4170, September 2000.
[DSS93] H. Drucker, R. Schapire, and P. Simard. Boosting performance in neural
networks. International Journal of Pattern Recognition and Artificial
Intelligence, 7(4):705–719, 1993.
[Elf89] A. Elfes. Using occupancy grids for mobile robot perception and navigation.
Computer, 22:46–57, June 1989.
[EWN04] Magnus Ekman, Fredrik Warg, and Jim Nilsson. An in-depth look at
computer performance growth. Technical Report 04-9, Department of
Computer Science and Engineering, Chalmers University of Technology,
2004.
[FH06] J.P. Farrugia and P. Horain. Gpucv: A framework for image processing
acceleration with graphics processors. In International Conference on
Multimedia and Expo (ICME), Toronto, Ontario, Canada, July 9–12
2006.
[FM05] J. Fung and S. Mann. Openvidia: parallel gpu computer vision. In
MULTIMEDIA ’05: Proceedings of the 13th annual ACM international
conference on Multimedia, pages 849–852, New York, NY, USA, 2005.
ACM Press.
[Fre90] Y. Freund. Boosting a weak learning algorithm by majority. In Proceedings
of the Third Annual Workshop on Computational Learning Theory,
pages 202–216, August 1990.
[FS95a] Y. Freund and R. E. Schapire. A decision-theoretic generalization of
on-line learning and an application to boosting. In European Conference
on Computational Learning Theory, pages 23–37, 1995.
[FS95b] Y. Freund and R. E. Schapire. A decision-theoretic generalization of online
learning and an application to boosting. In Computational Learning
Theory: Second European Conference, EuroCOLT ’95, pages 23–37.
Springer-Verlag, 1995.
[FS99] Yoav Freund and R. E. Schapire. A short introduction to boosting.
Journal of Japanese Society for Artificial Intelligence, 14(5):771–780,
Sep 1999. Appearing in Japanese, translation by Naoki Abe.
[Gho06] H. Ghorayeb. Libadaboost: developer guide, 2006.
[HA05] D. Hall and Al. Comparison of target detection algorithms using adaptive
background models. INRIA Rhone-Alpes, France and IST Lisbon,
Portugal and University of Edinburgh,UK, pages 585–601, 2005.
[HBC+03] A. Hampapur, L. Brown, J. Connell, S. Pankanti, A. Senior, and Y. Tian.
Smart surveillance: Applications, technologies and implications, 2003.
[HBC06] Ghorayeb H., Steux B., and Laurgeau C. Boosted algorithms for visual
object detection on graphics processing units. In ACCV06: Asian Conference
on Computer Vision 2006, pages 254–263, Hyderabad, India,
2006.
[Hei96] F. Heijden. Image Based Measurement Systems: Object Recognition and
Parameter Estimation. Wiley, 1996.
[Int06] Intel. Intel processors product list, 2006.
[JM02] P.Perez C.Hue J.Vermaak and M.Gangnet. Color-based probabilistic
tracking. IEEE Transactions on multimedia, 2002.
[JRO99] J.Staufer, R.Mech, and J. Ostermann. Detection of moving cast shadows
for object segmentation. IEEE Transactions on multimedia, pages 65–
76, March 1999.
[JWSX02] C. Jaynes, S. Webb, R. Steele, and Q. Xiong. An open development
environment for evaluation of video surveillance systems, 2002.
[Ka04] Kenji.O and all. A boosted particle filter multi–target detection and
tracking. ICCV, 2004.
[Kal60] E. Kalman. A new approach to linear filtering and prediction problems.
Transactions of the ASME-Journal of Basic Engineering, 82:35–
45, 1960.
[KDdlF+04] M. Kais, S. Dauvillier, A. de la Fortelle, I. Masaki, and C. Laugier.
Towards outdoor localization using gis, vision system and stochastic
error propagation. In International Conference on Autonomous Robots
and Agents, December 2004.
[KNAL05] A. Khammari, F. Nashashibi, Y. Abramson, and C. Laurgeau. Vehicle
detection combining gradient analysis and adaboost classification. In
Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, pages
66– 71, September 2005.
[KV88] M. Kearns and L. G. Valiant. Learning boolean formula or finite automate
is as hard as factoring. Technical Report TR-14-88, Harvard
University Aiken Computation Laboratory, August 1988.
[KV89] Michael Kearns and Leslie G. Valiant. Cryptographic limitations on
learning boolean formula and finite automate. In Proceedings of the
Twenty First Annual ACM Symposium on Theory of Computing, pages
433–444, May 1989.
[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational
Learning Theory. MIT Press, 1994.
[nVI07a] nVIDIA. nvidia graphic cards, 2007.
[nVI07b] nVIDIA. Cuda programming guide: Nvidia confidential, prepared and
provided under nda, 21 nov 2007.
[OPS+97] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio. Pedestrian
detection using wavelet templates. In Proc. Computer Vision and
Pattern Recognition, pages 193–199, June 1997.
[PEM06] T. Parag, A. Elgammal, and A. Mittal. A framework for feature selection
for background subtraction. In CVPR ’06: Proceedings of the
2006 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 1916–1923, Washington, DC, USA, 2006. IEEE
Computer Society.
[RE95] P.L. Rosin and T. Ellis. Image difference threshold strategies and shadow
detection. In Proc: 6th BMVC 1995 conf., pages 347–356, 1995.
[RSR+02] M. Rochery, R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore,
H. Alshawi, and S. Douglas. Combining prior knowledge and
boosting for call classification in spoken language dialogue. In International
Conference on Accoustics, Speech and Signal Processing, 2002.
[SAG03a] B. Steux, Y. Abramson, and H. Ghorayeb. Initial algorithms 1, delivrable
3.2b, project ist-2001-34410, public report, 2003.
[SAG03b] B. Steux, Y. Abramson, and H. Ghorayeb. Report on mapped algorithms,
delivrable 3.5, projet ist-2001-34410, internal report, 2003.
[Sch89] R. E. Schapire. The strength of weak learnability. In 30th Annual Symposium
on Foundations of Computer Science, pages 28–33, October 1989.
[SDK05] R. Strzodka, M. Doggett, and A. Kolb. Scientific computation for simulations
on programmable graphics hardware. Simulation Modelling
Practice and Theory, Special Issue: Programmable Graphics Hardware,
13(8):667–680, Nov 2005.
[SEN98] J. Steffens, E. Elagin, and H. Neven. Person spotter-fast and robust
system for human detection. In Proc. of IEEE Intl. Conf. on Automatic
Face and Gesture Recognition, pages 516–521, 1998.
[SS98] R. E. Schapire and Y. Singer. Improved boosting algorithms using
confidence-rated predictions. In Proceedings of the Eleventh Annual
Conference on Computational Learning Theory, pages 80–91, 1998. To
appear, Machine Learning.
[TDD99] T.Horprasert, D.Harwood, and L.S. Davis. A statistical approach for
real-time robust background subtraction and shadow detection. Proceedings
of International Conference on computer vision, 1999.
[TKBM99] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles
and practice of background maintenance. In International Conference
on Computer Vision (ICCV), volume 1, pages 255–261, 1999.
[Val84] L. G. Valiant. A theory of the learnable. Communications of the ACM,
27(11):1134–1142, November 1984.
[VJ01a] P. Viola and M. Jones. Rapid object detection using a boosted cascade
of simple features. European Conference on Computational Learning
Theory, 2001.
[VJ01b] Paul Viola and Michael Jones. Rapid object detection using a boosted
cascade of simple features. In Proceedings IEEE Conf. on Computer
Vision and Pattern Recognition, pages 511–518, 2001.
[VJJR02] V.Y.Marianoand, J.Min, JH.Park, and R.Kasturi. Performance evaluation
of object detection algorithms. In International Workshop on Visual
Surveillance and Performance Evaluation of Tracking and Surveillance,
pages 965–969, 2002.
[VJS03] Paul Viola, Michael J. Jones, and Daniel Snow. Detecting pedestrians
using patterns of motion and appearance. In IEEE International
Conference on Computer Vision, pages 734–741, Nice, France, October
2003.
[WATA97] C. Wren, A.Azarbayejani, T.Darrell, and A.Pentland. Pfinder:real-time
tracking of the human body. IEEE Transactions on pattern analysis and
machine intelligence, 19:780–785, July 1997.
[WHT03] L. Wang, W. Hu, and T. Tan. Recent developments in human motion
analysis. In Proceedings IEEE Conf. on Computer Vision and Pattern
Recognition, page 585601, 2003.
[WWT03] L. Wang, W.HU, and T.Tan. Recent developments in human motion
analysis. Pattern Recognition, 36:585–601, March 2003.
[YARL06] M. Yguel, O. Aycard, D. Raulo, and C. Laugier. Grid based fusion
of offboard cameras. In IEEE International Conference on Intelligent
Vehicules, 2006.
[ZJHW06] H. Zhang, W. Jia, X. He, and Q. Wu. Learning-based license plate
detection using global and local features. In Pattern Recognition, 2006.
ICPR 2006. 18th International Conference on, pages 1102–1105, August
2006.
Table of content
I Introduction and state of the art 1
1 French Introduction 2
1.1 Algorithmes - 4
1.2 Architecture - 4
1.3 Application - 5
2 Introduction 6
2.1 Contributions - 6
2.2 Outline - 7
3 Intelligent video surveillance systems (IVSS) 9
3.1 What is IVSS? - 10
3.1.1 Introduction - 10
3.1.2 Historical background - 10
3.2 Applications of intelligent surveillance - 11
3.2.1 Real time alarms - 11
3.2.2 User defined alarms - 11
3.2.3 Automatic unusual activity alarms - 12
3.2.4 Automatic Forensic Video Retrieval (AFVR) - 12
3.2.5 Situation awareness - 13
3.3 Scenarios and examples - 13
3.3.1 Public and commercial security - 13
3.3.2 Smart video data mining - 14
3.3.3 Law enforcement - 14
3.3.4 Military security - 14
3.4 Challenges - 14
3.4.1 Technical aspect - 14
3.4.2 Performance evaluation - 15
3.5 Choice of implementation methods - 15
3.6 Conclusion - 16
II Algorithms 17
4 Generic framework for intelligent visual surveillance 18
4.1 Object detection - 19
4.2 Object classification - 19
4.3 Object tracking - 20
4.4 Action recognition - 20
4.5 Semantic description - 21
4.6 Personal identification - 21
4.7 Fusion of data from multiple cameras - 21
5 Moving object detection 22
5.1 Challenge of detection - 23
5.2 Object detection system diagram - 25
5.2.1 Foreground detection - 26
5.2.2 Pixel level post-processing (Noise removal) - 27
5.2.3 Detecting connected components - 28
5.2.4 Region level post-processing - 28
5.2.5 Extracting object features - 28
5.3 Adaptive background differencing - 29
5.3.1 Basic Background Subtraction (BBS) - 29
5.3.2 W4 method - 30
5.3.3 Single Gaussian Model (SGM) - 31
5.3.4 Mixture Gaussian Model (MGM) - 32
5.3.5 Lehigh Omni-directional Tracking System (LOTS): - 33
5.4 Shadow and light change detection - 35
5.4.1 Methods and implementation - 36
5.5 High level feedback to improve detection methods - 46
5.5.1 The modular approach - 47
5.6 Performance evaluation - 48
5.6.1 Ground truth generation - 48
5.6.2 Datasets - 48
5.6.3 Evaluation metrics - 49
5.6.4 Experimental results - 54
5.6.5 Comments on the results - 56
6 Machine learning for visual object-detection 57
6.1 Introduction - 58
6.2 The theory of boosting - 58
6.2.1 Conventions and definitions - 58
6.2.2 Boosting algorithms - 60
6.2.3 AdaBoost - 61
6.2.4 Weak classifier - 63
6.2.5 Weak learner - 63
6.3 Visual domain - 66
6.3.1 Static detector - 67
6.3.2 Dynamic detector - 68
6.3.3 Weak classifiers - 68
6.3.4 Genetic weak learner interface - 75
6.3.5 Cascade of classifiers - 76
6.3.6 Visual finder - 77
6.4 LibAdaBoost: Library for Adaptive Boosting - 80
6.4.1 Introduction - 80
6.4.2 LibAdaBoost functional overview - 81
6.4.3 LibAdaBoost architectural overview - 85
6.4.4 LibAdaBoost content overview - 86
6.4.5 Comparison to previous work - 87
6.5 Use cases - 88
6.5.1 Car detection - 89
6.5.2 Face detection - 90
6.5.3 People detection - 91
6.6 Conclusion - 92
7 Object tracking 98
7.1 Initialization - 98
7.2 Sequential Monte Carlo tracking - 99
7.3 State dynamics - 100
7.4 Color distribution Model - 100
7.5 Results - 101
7.6 Incorporating Adaboost in the Tracker - 102
7.6.1 Experimental results - 102
III Architecture 107
8 General purpose computation on the GPU 108
8.1 Introduction - 108
8.2 Why GPGPU? - 109
8.2.1 Computational power - 109
8.2.2 Data bandwidth - 110
8.2.3 Cost/Performance ratio - 112
8.3 GPGPU’s first generation - 112
8.3.1 Overview - 112
8.3.2 Graphics pipeline - 114
8.3.3 Programming language - 119
8.3.4 Streaming model of computation - 120
8.3.5 Programmable graphics processor abstractions - 121
8.4 GPGPU’s second generation - 123
8.4.1 Programming model - 124
8.4.2 Application programming interface (API) - 124
8.5 Conclusion - 125
9 Mapping algorithms to GPU 126
9.1 Introduction - 126
9.2 Mapping visual object detection to GPU - 128
9.3 Hardware constraints - 130
9.4 Code generator - 131
9.5 Performance analysis - 133
9.5.1 Cascade Stages Face Detector (CSFD) - 133
9.5.2 Single Layer Face Detector (SLFD) - 134
9.6 Conclusion - 135
IV Application 137
10 Application: PUVAME 138
10.1 Introduction - 138
10.2 PUVAME overview - 139
10.3 Accident analysis and scenarios - 140
10.4 ParkNav platform - 141
10.4.1 The ParkView platform - 142
10.4.2 The CyCab vehicule - 144
10.5 Architecture of the system - 144
10.5.1 Interpretation of sensor data relative to the intersection - 145
10.5.2 Interpretation of sensor data relative to the vehicule - 149
10.5.3 Collision Risk Estimation - 149
10.5.4 Warning interface - 150
10.6 Experimental results - 151
V Conclusion and future work 153
11 Conclusion 154
11.1 Overview - 154
11.2 Future work - 155
12 French conclusion 157
VI Appendices 161
A Hello World GPGPU 162
B Hello World Brook 170
C Hello World CUDA 181
Repository Staff Only: edit this item