Clienti, Christophe (2009) Data flow architectures dedicated to image processing using Mathematical Morphology. PhD thesis Morphologie mathématique, CMM- Centre de Morphologie Mathématique, ENSMP p.240.
Full text available as:
|
|
Abstract
This thesis report is focused on studying data flow accelerators dedicated to image using mathematical morphology. The main objective is to provide a programmable and efficient implementation of basic morphological operators, and to assemble them in such a way as to provide complex operators with fast operation. In recent years, morphological algorithm research has been oriented towards finding elegant algorithms to compute these complex operators, such as watershed using priority queues. These complex algorithms often use specific data structures that are hard to deploy on platforms other than single-core, general-purpose processors. Moreover, these processors continue their development in the field of parallelism by heightening the number of cores. And because the frequency wall seems to have been reached, the best way to optimise performance is to use parallelising techniques. Consequently, we decided on fast implementations of complex mathematical morphology operators, based on highly parallel simpler operations. In the first part, we study existing computational kernels for neighbourhood processors and suggest new ones based on recent advances in mathematical morphology. In the second part, we use the neighbourhood processors as building blocks to generate and manage pipeline using high-level tools in a system on chip context. In the third part, we present a description of a basic VLIW processor using vector instructions deployed in a dataflow context to exploit spatial and temporal parallelism. Finally, we analyse the performance of our system against a multi-core workstation processor, and against a graphics processor to show the relevance of our approach.
| Item Type: | PhD Thesis (PhD) |
|---|---|
| PhD Supervisor: | Beucher, Serge and Bilodeau, Michel |
| Date: | 30 September 2009 |
| Board of examiners: | Jeulin, Dominique and Mérigot, Alain and Paindavoine, Michel and Akil, Mohamed and Beucher, Serge and Bilodeau, Michel and Demigny, Didier and Lemonnier, Fabrice |
| Ecole Doctorale: | ED 431 INFORMATION, COMMUNICATION, MODELISATION ET SIMULATION |
| Discipline: | Morphologie mathématique |
| Collection (Fonds): | Mines ParisTech (ENSMP) |
| Institution: | ENSMP |
| Department: | CMM- Centre de Morphologie Mathématique |
| Subjects: | 2. Information and Communication Sciences and Technologies 1. Mathematics and Applications |
| Uncontrolled Keywords: | Traitement image, Morphologie mathématique, Parallélisme, Processeur haute performance, processeur VLIW, Processeur vectoriel, Processeur pipeline, Calcul intensif, Image processing, Mathematical morphology, Parallel processing, Processors high performance, Very long instruction word processor, Vector processor, Pipeline processor, High performance computing |
| ID Code: | 5758 |
| Deposited By: | Claudine Abauzit |
| Deposited On: | 27 January 2010 |
References
[1] A.A. Abbo, R.P. Kleihorst, V. Choudhary, L. Sevat, P. Wielage, S. Mouy, B. Vermeulen,
and M. Heijligers. Xetal-ii : A 107 gops, 600 mw massively parallel processor for
video scene analysis. Solid-State Circuits, IEEE Journal of, 43(1) :192-201, Jan. 2008.
[2] Kelson R. T. Aires, Andre M. Santana, and Adelardo A. D. Medeiros. Optical
flow
using color information : preliminary results. In SAC '08 : Proceedings of the 2008
ACM symposium on Applied computing, pages 1607-1611, New York, NY, USA, 2008.
ACM.
[3] Antonio Albiol, J. Manuel Mossi, Alberto Albiol, and Valery Naranjo. Automatic
license plate reading using mathematical morphology. In Proceedings of the The 4th
IASTED International Conference on Visualization, Imaging and Image Processing,
Marbella, Spain, september 2004.
[4] Yannick Allusse, Patrick Horain, Ankit Agarwal, and Cindula Saipriyadarshan.
Gpucv : an opensource gpu-accelerated framework forimage processing and computer
vision. In MM '08 : Proceeding of the 16th ACM international conference on
Multimedia, pages 1089-1092, New York, NY, USA, 2008. ACM.
[5] Gene Amdahl. Validity of the single processor approach to achieving large-scale computing
capabilities. AFIPS Conference Proceedings, 30 :483-485, 1967.
[6] Iann M. Barron. The transputer. In MiniMicro West, volume 2, pages 1-8, San
Francisco, CA, November 1983.
[7] S. Beucher. Segmentation d'images et morphologie mathématique. Thèse de doctorat
en morphologie mathématique, ENSMP, 1990. 1822.
[8] S. Beucher, J-M. Blosseville, and F. Lenoir. Traffic spatial measurements using video
image processing. In Symposium on Optical and Optoelectronic Engineering, page 8,
Cambridge, Mass., USA, 1-6 Nov. 1987, 1987. 287.
[9] S. Beucher and C. Lantuéjoul. Use of watersheds in contour detection. Int. workshop on
image processing, real-time edge and motion detection/estimation, rennes, septembre
1979, CMM/CG Fontainebleau, ENSMP, Novembre 1979.
[10] Serge Beucher. Algorithmes sans biais de la ligne de partage des eaux. Note interne,
Centre de Morphologie Mathématique / ENSMP, Février 2002.
[11] Serge Beucher. Transformations résiduelles en morphologie numérique. Version longue
du papier présenté à iss, le 5 février 2004, Centre de Morphologie Math ématique /
ENSMP, mars 2004. voir http ://cmm.ensmp.fr/ beucher/publi.html.
[12] Serge Beucher. Numerical residues. Image Vision Comput., 25(4) :405-415, 2007.
[13] L. Biancardini, E. Dokladalova, S. Beucher, and L. Letellier. From moving edges to
moving regions. In IbPRIA 2005, Iberian Conference on Pattern Recognition and
Image Analysis, page 10, Estoril, Portugal, June 7-9 2005, 2005.
[14] Philippe Bonnot, Fabrice Lemonnier, Gilbert Edelin, Gerard Gaillat, Olivier Ruch, and
Pascal Gauget. Definition and simd implementation of a multi-processing architecture
approach on fpga. In DATE08, 2008.
[15] Jaromir Brambor. Algorithmes de la morphologie mathématique pour les architec-
tures orientées
flux. Thèse de doctorat en morphologie mathématique, ENSMP, 2006.
Dirigée par Michel Bilodeau.
[16] J. E. Bresenham. Algorithm for computer control of a digital plotter. pages 1-6, 1998.
[17] Philip Brisk, Adam Kaplan, and Majid Sarrafzadeh. Area-efficient instruction set
synthesis for reconfigurable system-on-chip designs. In DAC '04 : Proceedings of the
41st annual conference on Design automation, pages 395-400, New York, NY, USA,
2004. ACM.
[18] A. Broggi, G. Conte, F. Gregoretti, C. Sanso e, R. Passerone, and L. M. Reyneri. Design
and implementation of the paprica parallel architecture. J. VLSI Signal Process. Syst.,
19(1) :5-18, 1998.
[19] A. Broggi, G. Conte, F. Gregoretti, C. Sanso e, and L. M. Reyneri. The evolution of
the paprica system. Integr. Comput.-Aided Eng., 4(2) :114-136, 1997.
[20] Michael Butts, Anthony Mark Jones, and Paul Wasson. A structural object programming
model, architecture, chip and tools for reconfigurable computing. In FCCM
'07 : Proceedings of the 15th Annual IEEE Symposium on Field-Programmable Cus-
tom Computing Machines, pages 55-64,Washington, DC, USA, 2007. IEEE Computer
Society.
[21] E. Casseau, C. Jego, and E. Martin. Architectural synthesis of digital signal processing
applications dedicated to submicron technologies. Electronics, Circuits and Systems,
2001. ICECS 2001. The 8th IEEE International Conference on, 1 :535-538 vol.1, 2001.
[22] A. E. Charlesworth. An approach to scientific array processing : The architectural
design of the ap-120b/fps-164 family. Computer, 14(9) :18{27, 1981.
[23] Christophe Clienti, Serge Beucher, and Michel Bilodeau. A system on chip dedicated to
pipeline neighborhood processing for mathematical morphology. In EUSIPCO-2008,
16th European Signal Processing Conference, August 2008.
[24] Christophe Clienti, Michel Bilodeau, and Serge Beucher. An efficient hardware architecture
without line memories for morphological image processing. In ACIVS '08 :
Proceedings of the 10th International Conference on Advanced Concepts for Intelligent
Vision Systems, pages 147-156, Berlin, Heidelberg, 2008. Springer-Verlag.
[25] Katherine Compton and Scott Hauck. Reconfigurable computing : a survey of systems
and software. ACM Comput. Surv., 34(2) :17-210, 2002.
[26] J. Denoulet and A. Merigot. System on chip evolution of a simd architecture for
image processing. In Computer Architectures for Machine Perception, 2003 IEEE
International Workshop on, pages 9 pp.-298, May 2003.
[27] B. Ducourthial and A. Merigot. Parallel asynchronous computations for image analysis.
90(7) :1218-1229, July 2002.
[28] Raffi Enficiaud. Algorithmes multidimensionnels et multispectraux en Morphologie
Mathématique : approche par méta-programmation. Thèse de doctorat en morphologie
mathématique, ENSMP, 2007. Dirigée par Michel Bilodeau.
[29] M. Flynn. Very high speed computing systems. Proceedings of the IEEE, 54(12) :1901-
1909, 1966.
[30] Antoine Fraboulet and Tanguy Risset. Master interface for on-chip hardware accelerator
burst communications. J. VLSI Signal Process. Syst., 49(1) :73-85, 2007.
[31] J. Gil and M. Werman. Computing 2-d min, median, and max lters. IEEE Trans.
Pattern Anal. Mach. Intell., 15(5) :504-507, 1993.
[32] Andrew S. Glassner. Graphics Gems. Academic Press, Inc., Orlando, FL, USA, 1990.
[33] Andrew S. Glassner, editor. Graphics gems. Academic Press Professional, Inc., San
Diego, CA, USA, 1990.
[34] Jeremiah Golston. Dm642 digital media processor. volume 5022, pages 700-706. SPIE,
2003.
[35] Christophe Gratin. Le logiciel micromorph. Transparents école d'été, Centre de Morphologie
Mathématique / ENSMP, 1991. 4330.
[36] Intel. Advanced Vector Extensions, Mar 2008.
[37] Ernest W. Kent, Michael O. Shneier, and Ronald Lumia. Pipe (pipelined imageprocessing
engine). Journal of Parallel and Distributed Computing, 2(1) :50 - 78,
1985.
[38] B. Khailany, W.J. Dally, U.J. Kapasi, P. Mattson, J. Namkoong, J.D. Owens,
B. Towles, A. Chang, and S. Rixner. Imagine : media processing with streams. Micro,
IEEE, 21(2) :35-46, Mar/Apr 2001.
[39] B. Khailany, T. Williams, J. Lin, E. Long, M. Rygh, D. Tovey, and W.J. Daly. A
programmable 512 gops stream processor for signal, image, and video processing.
Solid-State Circuits Conference, 2007. ISSCC 2007. Digest of Technical Papers. IEEE
International, pages 272-602, Feb. 2007.
[40] J-C. Klein and R. Peyrard. Pimm1, an image processing asic based on mathematical
morphology. In Second Annual IEEE ASIC Seminar and Exhibit, pages 7.1.-7.1.4,
Rochester, September 25-28, 1989, 1989. 1263 CF L-32/90/MM.
[41] Jean-Claude Klein, Fabrice Lemonnier, Michel Gauthier, and René Peyrard. Hardware
implementation of the watershed zone algorithm based on a hierarchical queue
structure. In Proceedings IEEE Workshop on Nonlinear Signal and Image Processing,
pages 859-862, Neos Marmaras, Halkidiki, Greece, June 20-22, 1995, 1995. 5228.
[42] Christoforos Kozyrakis. A media-enhanced vector architecture for embedded memory
systems. Technical report, Berkeley, CA, USA, 1999.
[43] Christoforos Kozyrakis and David Patterson. Vector vs. superscalar and vliw architectures
for embedded multimedia benchmarks. In MICRO 35 : Proceedings of the 35th
annual ACM/IEEE international symposium on Microarchitecture, pages 283-293, Los
Alamitos, CA, USA, 2002. IEEE Computer Society Press.
[44] S. Kyo, T. Koga, and S. Okazaki. Imap-ce : a 51.2 gops video rate image processor with
128 vliw processing elements. Image Processing, 2001. Proceedings. 2001 International
Conference on, 3 :294-297 vol.3, 2001.
[45] M. Lam. Software pipelining : an effective scheduling technique for vliw machines.
SIGPLAN Not., 23(7) :318-328, 1988.
[46] C. Lantuéjoul and S. Beucher. Geodesic distance and image analysis. In 5th ICS,
volume 37 of Mikroscopie, pages 138-142, Salzburg, Autriche, Septembre 1979, 1980.
[47] Fabrice Lemonnier and Jean-Claude Klein. Fast dilation by large 1d structuring elements.
In IEEE Workshop on Nonlinear Signal and Image Processing, pages 479-482,
Neos Marmaras, Halkidiki, Greece, June 20-22, 1995, 1995. 5229.
[48] Romain Lerallut, Etienne Decencière, and Fernand Meyer. Image filltering using morphological
amoebas. Image Vision Comput., 25(4) :395-404, 2007.
[49] Erik Lindholm, John Nickolls, Stuart Oberman, and John Montrym. NVIDIA Tesla :
A Unified Graphics and Computing Architecture. IEEE Micro, 28(2) :39-55, 2008.
[50] Robert M. Lougheed and David L. McCubbrey. The cytocomputer : A practical
pipelined image processor. In ISCA '80 : Proceedings of the 7th annual symposium on
Computer Architecture, pages 271-277, New York, NY, USA, 1980. ACM.
[51] Georges Matheron. Les nivellements. note manuscrite, CG/CMM Fontainebleau,
ENSMP, Février 1997. 6004.
[52] David May and Roger Shepherd. Occam and the transputer. In Proc. of the IFIP WG
10.3 workshop on Concurrent languages in distributed systems : hardware supported
implementation, pages 19-33, New York, NY, USA, 1985. Elsevier North-Holland, Inc.
[53] F. Meyer. Un algorithme optimal de ligne de partage des eaux. In Actes 8 ème Congrès
AFCET Reconnaissance des Formes et Intelligence Artificielle, pages 847-857, Lyon-
Villeurbanne, 25-29 Novembre 1991, 1991. 2422 EX N-10/91/M.
[54] Fernand Meyer. From connected operators to levelings. In Henk J.A.M. Heijmans
and Jos B.T.M. Roerdink, editors, Mathematical Morphology and its Applications to
Image and Signal Processing, Proc. ISMM'98, pages 191-198, Amsterdam, June 1998,
1998. Dordrecht : Kluwer.
[55] Fernand Meyer and Jesus Angulo. Micro-viscous morphological operators. In Gerald
Jean Francis Banon, Junior Barrera, Ulisses de Mendon ca Braga-Neto, and Nina
Sumiko Tomita Hirata, editors, Proceedings, volume 1, pages 165-176, Sao Jose dos
Campos, October 10-13, 2007 2007. Universidade de Sao Paulo (USP), Instituto Nacional
de Pesquisas Espaciais (INPE).
[56] Ethan Mollick. Establishing moore's law. IEEE Ann. Hist. Comput., 28(3) :62-75,
2006.
[57] Fernando Moraes, Ney Calazans, Aline Mello, Leandro Möller, and Luciano Ost.
Hermes : an infrastructure for low area overhead packet-switching networks on chip.
Integr. VLSI J., 38(1) :69-93, 2004.
[58] N. Moreano, E. Borin, Cid de Souza, and G. Araujo. Efficient datapath merging for
partially reconfigurable architectures. Computer-Aided Design of Integrated Circuits
and Systems, IEEE Transactions on, 24(7) :969-980, July 2005.
[59] Dominique Noguet. Architectures parallèles pour la morphologie mathématique géodésique- Thèse de doctorat en micro -électronique, Institut Polytechnique de Grenoble,
INPG, 1998.
[60] OpenVidia. Cuda Vision Workbench, Feb 2009.
[61] A W Paeth. A fast algorithm for general raster rotation. In Proceedings on Graphics
Interface '86/Vision Interface '86, pages 77-81, Toronto, Ont., Canada, Canada, 1986.
Canadian Information Processing Society.
[62] Alan W. Paeth. Median nding on a 3 x 3 grid. pages 171-175, 1990.
[63] A. Rosenfeld and J.L. Pfaltz. Distance functions on digital pictures. Pattern Recog-
nition, 1 :33-61, 1968.
[64] Richard M. Russell. The cray-1 computer system. Commun. ACM, 21(1) :63-72, 1978.
[65] S. Kyo S. Okazaki and F. Hidano. Imapcar : A highly parallel integrated memory array
processor for in-vehicle image recognition applications. Proc. ITS World Congress,
pages ID-1744, 2006.
[66] Lorenz A. Schmitt and Stephen S. Wilson. The ais-5000 parallel processor. IEEE
Trans. Pattern Anal. Mach. Intell., 10(3) :320-330, 1988.
[67] Robert Schöne, Wolfgang E. Nagel, and Stefan P
Pflüger. Analyzing cache bandwidth
on the intel core 2 architecture. In Christian H. Bischof, H. Martin Bücker, Paul
Gibbon, Gerhard R. Joubert, Thomas Lippert, Bernd Mohr, and Frans J. Peters,
editors, PARCO, volume 15 of Advances in Parallel Computing, pages 365-372. IOS
Press, 2007.
[68] Larry Seiler, Doug Carmean, Eric Sprangle, Tom Forsyth, Michael Abrash, Pradeep
Dubey, Stephen Junkins, Adam Lake, Jeremy Sugerman, Robert Cavin, Roger Espasa,
Ed Grochowski, Toni Juan, and Pat Hanrahan. Larrabee : a many-core x86
architecture for visual computing. ACM Trans. Graph., 27(3) :1-15, 2008.
[69] J. Serra. Image analysis and mathematical morphology. Academic Press, London,
1982.
[70] J.L. Smith. Implementing median filters in xc4000e fpgas. Xilinx Application Notes,
44, 1996.
[71] Pierre Soille. Morphological Image Analysis : Principles and Applications. Springer-
Verlag New York, Inc., Secaucus, NJ, USA, 2003.
[72] Charles V. Stewart and Charles R. Dyer. Scheduling algorithms for pipe (pipelined
image-processing engine). Journal of Parallel and Distributed Computing, 5(2) :131 -
153, 1988.
[73] Paola Sunna. Avc/h.264, un système de codage vidéo évolué pour la hd et la sd.
Technical report, RAI / CRIT / UER, 2005.
[74] K. Tatas, K. Siozios, and D. Soudris. A Survey of Existing Fine-Grain Reconfigurable
Architectures and CAD tools. Springer Netherlands, 2007.
[75] Jay Trodden, Don Anderson, and MindShare Inc. HyperTransport System Architec-
ture. Addison-Wesley Professional, Boston, MA, USA, 2003.
[76] H. Ueda, K. Kato, H. Matsushima, K. Kaneko, and M. Ejiri. A multiprocessor system
utilizing enhanced dsps for image processing. In Systolic Arrays, 1988., Proceedings
of the International Conference on, pages 611-620, May 1988.
[77] R. van den Boomgaard and D. A. Wester. Logarithmic shape decomposition. In
C. Arcelli, L. P. Cordella, and G. Sanniti di Baja, editors, Aspects of Visual Form
Processing, pages 552-561, Capri, Italy, May 1994. World Scientific Publishing Co.,
Singapore.
[78] Rein van den Boomgaard and Richard van Balen. Methods for fast morphological
image transforms using bitmapped binary images. CVGIP : Graph. Models Image
Process., 54(3) :252-258, 1992.
[79] M. van Herk. A fast algorithm for local minimum and maximum filters on rectangular
and octagonal kernels. Pattern Recognition Letters, 13(7) :517-521, 1992.
[80] Luc Vincent. Morphological grayscale reconstruction in image analysis : applications
and efficient algorithms. To appear in the ieee transactions on image processing, 1993,
1992. 2846.
[81] E. Waingold, M. Taylor, V. Sarkar, V. Lee, W. Lee, J. Kim, M. Frank, P. Finch,
S. Devabhaktumi, R. Barua, J. Babb, S. Amarsinghe, and A. Agarwal. Baring it all
to software : The raw machine. Technical report, Cambridge, MA, USA, 1997.
[82] Shlomo Weiss and James E. Smith. A study of scalar compilation techniques for
pipelined supercomputers. ACM Trans. Math. Softw., 16(3) :223-245, 1990.
[83] David Wentzla , Patrick Griffin, Henry Hoffmann, Liewei Bao, Bruce Edwards, Carl
Ramey, Matthew Mattina, Chyi-Chang Miao, John F. Brown III, and Anant Agarwal.
On-chip interconnection architecture of the tile processor. IEEE Micro, 27(5) :15-31,
2007.
[84] S. Yehia, S. Girbal, H. Berry, and O. Temam. Reconciling specialization and
exibility
through compound circuits. pages 277-288, Feb. 2009.
[85] M. Zuluaga and N. Topham. Resource sharing in custom instruction set extensions.
Application Specific Processors, 2008. SASP 2008. Symposium on, pages 7-13, June
2008.
Repository Staff Only: edit this item