Ghorayeb, Hicham (2007) Conception et mise en oeuvre d'algorithmes de vision temps-réel pour la vidéo surveillance intelligente. Doctorat Informatique temps réel, robotique et automatique, CAOR- Centre de robotique, ENSMP p.197.
Plein texte disponible en tant que :
|
|
Résumé
Notre objectif est d'étudier les algorithmes de vision utilisés aux différents niveaux dans une chaîne de traitement vidéo intelligente. On a prototypé une chaîne de traitement générique dédiée à l'analyse du contenu du flux vidéo. En se basant sur cette chaîne de traitement, on a développé une application de détection et de suivi de piétons. Cette application est une partie intégrante du projet PUVAME.
Cette chaîne de traitement générique est composée de plusieurs étapes: détection, classification et suivi d'objets. D'autres étapes de plus haut niveau sont envisagées comme la reconnaissance d'actions, l'identification, la description sémantique ainsi que la fusion des données de plusieurs caméras. On s'est intéressé aux deux premières étapes. On a exploré des algorithmes de segmentation du fond dans un flux vidéo avec caméra fixe. On a implémenté et comparé des algorithmes basés sur la modélisation adaptative du fond.
On a aussi exploré la détection visuelle d'objets basée sur l'apprentissage automatique en utilisant la technique du boosting. Cependant, On a développé une librairie intitulée LibAdaBoost qui servira comme un environnement de prototypage d'algorithmes d'apprentissage automatique. On a prototypé la technique du boosting au sein de cette librairie. On a distribué LibAdaBoost sous la licence LGPL. Cette librairie est unique avec les fonctionnalités qu'elle offre.
On a exploré l'utilisation des cartes graphiques pour l'accélération des algorithmes de vision. On a effectué le portage du détecteur visuel d'objets basé sur un classifieur généré par le boosting pour qu'il s'exécute sur le processeur graphique. On était les premiers à effectuer ce portage. On a trouvé que l'architecture du processeur graphique est la mieux adaptée pour ce genre d'algorithmes.
La chaîne de traitement a été implémentée et intégrée à l'environnement RTMaps. On a évalué ces algorithmes sur des scénarios bien définis. Ces scénarios ont été définis dans le cadre de PUVAME.
| Type d'EPrint: | Thèse (Doctorat) |
|---|---|
| Directeur de Thèse: | Laurgeau, Claude |
| Date: | 12 Septembre 2007 |
| Jury de Thèse: | Meyrueis, Patrick et Siarry, Patrick et Akil, Mohamed et Steux, Bruno et Laurgeau, Claude et Meyer, Fernand et Schwerdt, Karl |
| Discipline: | Informatique temps réel, robotique et automatique |
| Fonds: | ENSMP |
| Institution: | ENSMP |
| Laboratoire: | CAOR- Centre de robotique |
| Sujets: | 1. Mathématiques et leurs applications |
| Mots-clés libres: | Vidéo surveillance, Boosting, Reconnaissance automatique des formes, Système de transport intelligent, Apprentissage automatique, Détection objet en mouvement, méthode Monte Carlo |
| Code ID: | 3064 |
| Déposé par : | Claudine Abauzit |
| Déposé le : | 05 Novembre 2007 |
Références Bibliographiques
[Abr06] Y. Abramson. Pedestrian detection for intelligent transportation systems.
EMP Press, 2006.
[Ame03] A. Amer. Voting-based simultaneous tracking of multiple video objects.
In Proc. SPIE Int. Symposium on Electronic Imaging, pages 500–511,
2003.
[ASG05] Y. Abramson, B. Steux, and H. Ghorayeb. Yef real-time object detection.
In ALART’05:International workshop on Automatic Learning and
Real-Time, pages 5–13, 2005.
[BA04] J. Bobruk and D. Austin. Laser motion detection and hypothesis tracking
from a mobile platform. In Australasian Conference on Robotics and
Automation (ACRA), 2004.
[BER03] J. Black, T. Ellis, and P. Rosin. A novel method for video tracking
performance evaluation. In International Workshop on Visual Surveillance
and Performance Evaluation of Tracking and Surveillance, pages
125–132, 2003.
[BFH04a] I. Buck, K. Fatahalian, and Hanrahan. Gpubench: Evaluating gpu performance
for numerical and scientific applications. Proceedings of the
2004 ACM Workshop on General-Purpose Computing on Graphics Processors,
pages C–20, Aug 2004.
[BFH+04b] I. Buck, T. Foley, D. Horn, J. Sugerman, K. Mike, and H. Pat. Brook
for gpus: Stream computing on graphics hardware, 2004.
[BMGE01] T. Boult, R. Micheals, X. Gao, and M. Eckmann. Into the woods:
visual surveillance of noncooperative and camouflaged targets in complex
outdoor settings, 2001.
[BS03] H. Ghorayeb B. Steux, Y. Abramson. Camellia image processing library,
2003.
[CG00] C.Stauffer and W.E.L. Grimson. Adaptive background mixture models
for real-time tracking. IEEE Transactions on pattern analysis and
machine intelligence, pages 747–757, August 2000.
[CL04] B. Chen and Y. Lei. Indoor and outdoor people detection and shadow
suppression by exploiting hsv color information. cit, 00:137–142, 2004.
[Cro84] F. C. Crow. Summed-area tables for texture mapping. In SIGGRAPH
’84: Proceedings of the 11th annual conference on Computer graphics
and interactive techniques, pages 207–212, New York, NY, USA, 1984.
ACM Press.
[CSW03] H. Cramer, U. Scheunert, and G.Wanielik. Multi sensor fusion for object
detection using generalized feature models. In International Conference
on Information Fusion, 2003.
[Ded04] Y Dedeoglu. Moving object detection, tracking and classification for
smart video surveillance, 2004.
[DHS00] Richard O. Duda, Peter E. Hart, and David G. Stork. Pattern Classification
(2nd Edition). Wiley-Interscience, 2000.
[DM00] D. Doermann and D. Mihalcik. Tools and techniques for video performance
evaluation. Proceedings of the International Conference on
Pattern Recognition (ICPR00), pages 4167–4170, September 2000.
[DSS93] H. Drucker, R. Schapire, and P. Simard. Boosting performance in neural
networks. International Journal of Pattern Recognition and Artificial
Intelligence, 7(4):705–719, 1993.
[Elf89] A. Elfes. Using occupancy grids for mobile robot perception and navigation.
Computer, 22:46–57, June 1989.
[EWN04] Magnus Ekman, Fredrik Warg, and Jim Nilsson. An in-depth look at
computer performance growth. Technical Report 04-9, Department of
Computer Science and Engineering, Chalmers University of Technology,
2004.
[FH06] J.P. Farrugia and P. Horain. Gpucv: A framework for image processing
acceleration with graphics processors. In International Conference on
Multimedia and Expo (ICME), Toronto, Ontario, Canada, July 9–12
2006.
[FM05] J. Fung and S. Mann. Openvidia: parallel gpu computer vision. In
MULTIMEDIA ’05: Proceedings of the 13th annual ACM international
conference on Multimedia, pages 849–852, New York, NY, USA, 2005.
ACM Press.
[Fre90] Y. Freund. Boosting a weak learning algorithm by majority. In Proceedings
of the Third Annual Workshop on Computational Learning Theory,
pages 202–216, August 1990.
[FS95a] Y. Freund and R. E. Schapire. A decision-theoretic generalization of
on-line learning and an application to boosting. In European Conference
on Computational Learning Theory, pages 23–37, 1995.
[FS95b] Y. Freund and R. E. Schapire. A decision-theoretic generalization of online
learning and an application to boosting. In Computational Learning
Theory: Second European Conference, EuroCOLT ’95, pages 23–37.
Springer-Verlag, 1995.
[FS99] Yoav Freund and R. E. Schapire. A short introduction to boosting.
Journal of Japanese Society for Artificial Intelligence, 14(5):771–780,
Sep 1999. Appearing in Japanese, translation by Naoki Abe.
[Gho06] H. Ghorayeb. Libadaboost: developer guide, 2006.
[HA05] D. Hall and Al. Comparison of target detection algorithms using adaptive
background models. INRIA Rhone-Alpes, France and IST Lisbon,
Portugal and University of Edinburgh,UK, pages 585–601, 2005.
[HBC+03] A. Hampapur, L. Brown, J. Connell, S. Pankanti, A. Senior, and Y. Tian.
Smart surveillance: Applications, technologies and implications, 2003.
[HBC06] Ghorayeb H., Steux B., and Laurgeau C. Boosted algorithms for visual
object detection on graphics processing units. In ACCV06: Asian Conference
on Computer Vision 2006, pages 254–263, Hyderabad, India,
2006.
[Hei96] F. Heijden. Image Based Measurement Systems: Object Recognition and
Parameter Estimation. Wiley, 1996.
[Int06] Intel. Intel processors product list, 2006.
[JM02] P.Perez C.Hue J.Vermaak and M.Gangnet. Color-based probabilistic
tracking. IEEE Transactions on multimedia, 2002.
[JRO99] J.Staufer, R.Mech, and J. Ostermann. Detection of moving cast shadows
for object segmentation. IEEE Transactions on multimedia, pages 65–
76, March 1999.
[JWSX02] C. Jaynes, S. Webb, R. Steele, and Q. Xiong. An open development
environment for evaluation of video surveillance systems, 2002.
[Ka04] Kenji.O and all. A boosted particle filter multi–target detection and
tracking. ICCV, 2004.
[Kal60] E. Kalman. A new approach to linear filtering and prediction problems.
Transactions of the ASME-Journal of Basic Engineering, 82:35–
45, 1960.
[KDdlF+04] M. Kais, S. Dauvillier, A. de la Fortelle, I. Masaki, and C. Laugier.
Towards outdoor localization using gis, vision system and stochastic
error propagation. In International Conference on Autonomous Robots
and Agents, December 2004.
[KNAL05] A. Khammari, F. Nashashibi, Y. Abramson, and C. Laurgeau. Vehicle
detection combining gradient analysis and adaboost classification. In
Intelligent Transportation Systems, 2005. Proceedings. 2005 IEEE, pages
66– 71, September 2005.
[KV88] M. Kearns and L. G. Valiant. Learning boolean formula or finite automate
is as hard as factoring. Technical Report TR-14-88, Harvard
University Aiken Computation Laboratory, August 1988.
[KV89] Michael Kearns and Leslie G. Valiant. Cryptographic limitations on
learning boolean formula and finite automate. In Proceedings of the
Twenty First Annual ACM Symposium on Theory of Computing, pages
433–444, May 1989.
[KV94] Michael J. Kearns and Umesh V. Vazirani. An Introduction to Computational
Learning Theory. MIT Press, 1994.
[nVI07a] nVIDIA. nvidia graphic cards, 2007.
[nVI07b] nVIDIA. Cuda programming guide: Nvidia confidential, prepared and
provided under nda, 21 nov 2007.
[OPS+97] M. Oren, C. Papageorgiou, P. Sinha, E. Osuna, and T. Poggio. Pedestrian
detection using wavelet templates. In Proc. Computer Vision and
Pattern Recognition, pages 193–199, June 1997.
[PEM06] T. Parag, A. Elgammal, and A. Mittal. A framework for feature selection
for background subtraction. In CVPR ’06: Proceedings of the
2006 IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, pages 1916–1923, Washington, DC, USA, 2006. IEEE
Computer Society.
[RE95] P.L. Rosin and T. Ellis. Image difference threshold strategies and shadow
detection. In Proc: 6th BMVC 1995 conf., pages 347–356, 1995.
[RSR+02] M. Rochery, R. Schapire, M. Rahim, N. Gupta, G. Riccardi, S. Bangalore,
H. Alshawi, and S. Douglas. Combining prior knowledge and
boosting for call classification in spoken language dialogue. In International
Conference on Accoustics, Speech and Signal Processing, 2002.
[SAG03a] B. Steux, Y. Abramson, and H. Ghorayeb. Initial algorithms 1, delivrable
3.2b, project ist-2001-34410, public report, 2003.
[SAG03b] B. Steux, Y. Abramson, and H. Ghorayeb. Report on mapped algorithms,
delivrable 3.5, projet ist-2001-34410, internal report, 2003.
[Sch89] R. E. Schapire. The strength of weak learnability. In 30th Annual Symposium
on Foundations of Computer Science, pages 28–33, October 1989.
[SDK05] R. Strzodka, M. Doggett, and A. Kolb. Scientific computation for simulations
on programmable graphics hardware. Simulation Modelling
Practice and Theory, Special Issue: Programmable Graphics Hardware,
13(8):667–680, Nov 2005.
[SEN98] J. Steffens, E. Elagin, and H. Neven. Person spotter-fast and robust
system for human detection. In Proc. of IEEE Intl. Conf. on Automatic
Face and Gesture Recognition, pages 516–521, 1998.
[SS98] R. E. Schapire and Y. Singer. Improved boosting algorithms using
confidence-rated predictions. In Proceedings of the Eleventh Annual
Conference on Computational Learning Theory, pages 80–91, 1998. To
appear, Machine Learning.
[TDD99] T.Horprasert, D.Harwood, and L.S. Davis. A statistical approach for
real-time robust background subtraction and shadow detection. Proceedings
of International Conference on computer vision, 1999.
[TKBM99] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers. Wallflower: Principles
and practice of background maintenance. In International Conference
on Computer Vision (ICCV), volume 1, pages 255–261, 1999.
[Val84] L. G. Valiant. A theory of the learnable. Communications of the ACM,
27(11):1134–1142, November 1984.
[VJ01a] P. Viola and M. Jones. Rapid object detection using a boosted cascade
of simple features. European Conference on Computational Learning
Theory, 2001.
[VJ01b] Paul Viola and Michael Jones. Rapid object detection using a boosted
cascade of simple features. In Proceedings IEEE Conf. on Computer
Vision and Pattern Recognition, pages 511–518, 2001.
[VJJR02] V.Y.Marianoand, J.Min, JH.Park, and R.Kasturi. Performance evaluation
of object detection algorithms. In International Workshop on Visual
Surveillance and Performance Evaluation of Tracking and Surveillance,
pages 965–969, 2002.
[VJS03] Paul Viola, Michael J. Jones, and Daniel Snow. Detecting pedestrians
using patterns of motion and appearance. In IEEE International
Conference on Computer Vision, pages 734–741, Nice, France, October
2003.
[WATA97] C. Wren, A.Azarbayejani, T.Darrell, and A.Pentland. Pfinder:real-time
tracking of the human body. IEEE Transactions on pattern analysis and
machine intelligence, 19:780–785, July 1997.
[WHT03] L. Wang, W. Hu, and T. Tan. Recent developments in human motion
analysis. In Proceedings IEEE Conf. on Computer Vision and Pattern
Recognition, page 585601, 2003.
[WWT03] L. Wang, W.HU, and T.Tan. Recent developments in human motion
analysis. Pattern Recognition, 36:585–601, March 2003.
[YARL06] M. Yguel, O. Aycard, D. Raulo, and C. Laugier. Grid based fusion
of offboard cameras. In IEEE International Conference on Intelligent
Vehicules, 2006.
[ZJHW06] H. Zhang, W. Jia, X. He, and Q. Wu. Learning-based license plate
detection using global and local features. In Pattern Recognition, 2006.
ICPR 2006. 18th International Conference on, pages 1102–1105, August
2006.
Table des Matières
I Introduction and state of the art 1
1 French Introduction 2
1.1 Algorithmes - 4
1.2 Architecture - 4
1.3 Application - 5
2 Introduction 6
2.1 Contributions - 6
2.2 Outline - 7
3 Intelligent video surveillance systems (IVSS) 9
3.1 What is IVSS? - 10
3.1.1 Introduction - 10
3.1.2 Historical background - 10
3.2 Applications of intelligent surveillance - 11
3.2.1 Real time alarms - 11
3.2.2 User defined alarms - 11
3.2.3 Automatic unusual activity alarms - 12
3.2.4 Automatic Forensic Video Retrieval (AFVR) - 12
3.2.5 Situation awareness - 13
3.3 Scenarios and examples - 13
3.3.1 Public and commercial security - 13
3.3.2 Smart video data mining - 14
3.3.3 Law enforcement - 14
3.3.4 Military security - 14
3.4 Challenges - 14
3.4.1 Technical aspect - 14
3.4.2 Performance evaluation - 15
3.5 Choice of implementation methods - 15
3.6 Conclusion - 16
II Algorithms 17
4 Generic framework for intelligent visual surveillance 18
4.1 Object detection - 19
4.2 Object classification - 19
4.3 Object tracking - 20
4.4 Action recognition - 20
4.5 Semantic description - 21
4.6 Personal identification - 21
4.7 Fusion of data from multiple cameras - 21
5 Moving object detection 22
5.1 Challenge of detection - 23
5.2 Object detection system diagram - 25
5.2.1 Foreground detection - 26
5.2.2 Pixel level post-processing (Noise removal) - 27
5.2.3 Detecting connected components - 28
5.2.4 Region level post-processing - 28
5.2.5 Extracting object features - 28
5.3 Adaptive background differencing - 29
5.3.1 Basic Background Subtraction (BBS) - 29
5.3.2 W4 method - 30
5.3.3 Single Gaussian Model (SGM) - 31
5.3.4 Mixture Gaussian Model (MGM) - 32
5.3.5 Lehigh Omni-directional Tracking System (LOTS): - 33
5.4 Shadow and light change detection - 35
5.4.1 Methods and implementation - 36
5.5 High level feedback to improve detection methods - 46
5.5.1 The modular approach - 47
5.6 Performance evaluation - 48
5.6.1 Ground truth generation - 48
5.6.2 Datasets - 48
5.6.3 Evaluation metrics - 49
5.6.4 Experimental results - 54
5.6.5 Comments on the results - 56
6 Machine learning for visual object-detection 57
6.1 Introduction - 58
6.2 The theory of boosting - 58
6.2.1 Conventions and definitions - 58
6.2.2 Boosting algorithms - 60
6.2.3 AdaBoost - 61
6.2.4 Weak classifier - 63
6.2.5 Weak learner - 63
6.3 Visual domain - 66
6.3.1 Static detector - 67
6.3.2 Dynamic detector - 68
6.3.3 Weak classifiers - 68
6.3.4 Genetic weak learner interface - 75
6.3.5 Cascade of classifiers - 76
6.3.6 Visual finder - 77
6.4 LibAdaBoost: Library for Adaptive Boosting - 80
6.4.1 Introduction - 80
6.4.2 LibAdaBoost functional overview - 81
6.4.3 LibAdaBoost architectural overview - 85
6.4.4 LibAdaBoost content overview - 86
6.4.5 Comparison to previous work - 87
6.5 Use cases - 88
6.5.1 Car detection - 89
6.5.2 Face detection - 90
6.5.3 People detection - 91
6.6 Conclusion - 92
7 Object tracking 98
7.1 Initialization - 98
7.2 Sequential Monte Carlo tracking - 99
7.3 State dynamics - 100
7.4 Color distribution Model - 100
7.5 Results - 101
7.6 Incorporating Adaboost in the Tracker - 102
7.6.1 Experimental results - 102
III Architecture 107
8 General purpose computation on the GPU 108
8.1 Introduction - 108
8.2 Why GPGPU? - 109
8.2.1 Computational power - 109
8.2.2 Data bandwidth - 110
8.2.3 Cost/Performance ratio - 112
8.3 GPGPU’s first generation - 112
8.3.1 Overview - 112
8.3.2 Graphics pipeline - 114
8.3.3 Programming language - 119
8.3.4 Streaming model of computation - 120
8.3.5 Programmable graphics processor abstractions - 121
8.4 GPGPU’s second generation - 123
8.4.1 Programming model - 124
8.4.2 Application programming interface (API) - 124
8.5 Conclusion - 125
9 Mapping algorithms to GPU 126
9.1 Introduction - 126
9.2 Mapping visual object detection to GPU - 128
9.3 Hardware constraints - 130
9.4 Code generator - 131
9.5 Performance analysis - 133
9.5.1 Cascade Stages Face Detector (CSFD) - 133
9.5.2 Single Layer Face Detector (SLFD) - 134
9.6 Conclusion - 135
IV Application 137
10 Application: PUVAME 138
10.1 Introduction - 138
10.2 PUVAME overview - 139
10.3 Accident analysis and scenarios - 140
10.4 ParkNav platform - 141
10.4.1 The ParkView platform - 142
10.4.2 The CyCab vehicule - 144
10.5 Architecture of the system - 144
10.5.1 Interpretation of sensor data relative to the intersection - 145
10.5.2 Interpretation of sensor data relative to the vehicule - 149
10.5.3 Collision Risk Estimation - 149
10.5.4 Warning interface - 150
10.6 Experimental results - 151
V Conclusion and future work 153
11 Conclusion 154
11.1 Overview - 154
11.2 Future work - 155
12 French conclusion 157
VI Appendices 161
A Hello World GPGPU 162
B Hello World Brook 170
C Hello World CUDA 181
Administrateurs de l'archive uniquement : éditer cet enregistrement

