Cuturi, Marco (2005) Etude de noyaux de semigroupe pour objets structurés dans le cadre de l'apprentissage statistique. PhD thesis Géostatistique, CG-Centre de Géostatistique, ENSMP p.114.
Full text available as:
|
|
Abstract
Kernel methods refer to a new family of data analysis tools which may be used in standardized learning contexts, such as classification or regression. Such tools are grounded on an \textit{a priori} similarity measure between the objects to be handled, which have been named ``kernels'' in the statistical learning and functional analysis literature. The simplicity of kernel methods comes from the fact that, given a learning task, such methods only require the definition of a kernel to compare the objects to yield practical results.
The problem of selecting the right kernel for a task is nonetheless tricky, notably when the objects have complex structures. We propose in this work various families of generic kernels for composite objects, such as strings, graphs or images. The kernels that we obtain are tailored to compare clouds of points, histograms or more generally positive measures. Our approach is mainly motivated by algebraic considerations on the sets of interests, which is why we make frequent use of the theory of harmonic functions on semigroups in this work. The theoretical justification for such kernels is further grounded on the use of reproducing kernel Hilbert spaces, in which the measures are embedded, along with elements of convex analysis and descriptors of the measures used in statistics and information theory, such as variance and entropy. By mapping any structured object to a cloud of components, \eg taking a string and turning it into a cloud or a histogram of substrings, we apply these kernels on composite objects coupled with discriminative methods, such as the support vector machine, to address classification problems encountered in bioinformatics or image analysis.
We extend this framework in the end of the thesis to propose a different viewpoint where objects are no longer seen as clouds of points but rather as nested clouds, where each cloud is labelled according to a set of events endowed with a hierarchy. We show how to benefit from such a description to apply a multiresolution comparison scheme between the objects.
| Item Type: | PhD Thesis (PhD) |
|---|---|
| Thesis Supervisor: | Vert, Jean-Philippe |
| Date: | 17 November 2005 |
| Board of examiners: | Boucheron, Stéphane and Bousquet, Olivier and Fukumizu, Kenji and Vert, Jean-Philippe and Lafferty, John and Catoni, Olivier |
| Discipline: | Géostatistique |
| Collection (Fonds): | ENSMP |
| Institution: | ENSMP |
| Department: | CG-Centre de Géostatistique |
| Subjects: | 1. Mathematics and Applications |
| Uncontrolled Keywords: | Apprentissage, Méthodes à noyaux, Bioinformatique |
Table of content
Foreword - iii
1 Introduction - 1
1.1 Nuts and Bolts of Kernel Methods - 3
1.1.1 The Multiple Facets of Kernels in Machine Learning - 4
1.1.2 Using Reproducing Kernel Hilbert Spaces in Practice - 8
1.1.3 Selecting and Designing the Kernel - 13
1.2 Blending Discriminative and Generative Approaches with Kernels - 15
1.2.1 Statistical Models as Feature Extractors - 15
1.2.2 Nonparametric Kernels on Measures - 18
1.3 Contribution of this Thesis - 20
1.3.1 A String Kernel Inspired by Universal Coding - 21
1.3.2 Semigroup Kernels on Measures - 21
1.3.3 Spectral Semigroup Kernels on Measures - 21
1.3.4 A Multiresolution Framework for Nested Measures - 22
2 The Context Tree Kernel for Strings 23
2.1 Introduction - 24
2.2 Probabilistic Models and Mutual Information Kernels - 26
2.3 A Mutual Information Kernel Based on Context-Tree Models - 28
2.3.1 Framework and notations - 28
2.3.2 Context-Tree Models - 29
2.3.3 Prior Distributions on Context-Tree Models - 29
2.3.4 Triple Mixture Context-Tree Kernel - 32
2.4 Kernel Implementation - 32
2.4.1 Defining Counters - 33
2.4.2 Recursive Computation of the Triple Mixture - 33
2.5 Source Coding and Compression Interpretation - 36
2.6 Experiments - 38
2.6.1 Protein Domain Homology Detection Benchmark - 38
2.6.2 Parameter Tuning and Comparison with Alternative String Kernels - 39
2.6.3 Mean Performances and Curves - 40
2.7 Closing remarks - 42
3 Semigroup Kernels on Measures - 45
3.1 Introduction - 46
3.2 Notations and Framework - 48
3.2.1 Measures on Basic Components - 48
3.2.2 Semigroups and Sets of Points - 49
3.3 The Entropy and Inverse Generalized Variance Kernels - 51
3.3.1 Entropy Kernel - 51
3.3.2 Inverse Generalized Variance Kernel - 53
3.4 Semigroup Kernels on Molecular Measures - 55
3.4.1 Entropy Kernel on Smoothed Estimates - 56
3.4.2 Regularized Inverse Generalized Variance of Molecular Measures - 56
3.5 Inverse Generalized Variance on the RKHS associated with a Kernel κ - 59
3.6 Integral Representation of p.d. Functions on a Set of Measures - 61
3.7 Projection on Exponential Families through Laplace's Approximation - 64
3.8 Experiments on images of the MNIST database - 66
3.8.1 Linear IGV Kernel - 67
3.8.2 Kernelized IGV - 68
3.8.3 Experiments on the SVM Generalization Error - 70
3.9 Closing remarks - 73
4 Semigroup Spectral Functions on Measures - 77
4.1 Introduction - 78
4.2 Comparing Measures on Arbitrary Spaces through Kernelized Estimates78
4.2.1 Measure Representations of Objects and Component Spaces - 79
4.2.2 Computing Kernels on Measures through Variance - 80
4.2.3 The Semigroup Framework for Measures - 81
4.3 Semigroup Spectral Functions of Measures - 82
4.3.1 An Extended Framework for Semigroup Kernels - 82
4.3.2 Characteristic Functions and s.s.p.d. Kernels - 83
4.3.3 A Few Examples of Semigroup Spectral Functions - 85
4.4 The Trace as a Semigroup Spectral Function - 88
4.4.1 The Trace Kernel on Molecular Measures - 89
4.4.2 Practical Formulations for the Trace Kernel - 91
5 Multiresolution Kernels - 95
5.1 Introduction - 96
5.2 Multiresolution Kernels - 98
5.2.1 Local Similarities Between Measures Conditioned by Sets of Events - 99
5.2.2 Resolution Specific Kernels - 99
5.2.3 Averaging Resolution Specific Kernels - 101
5.3 Kernel Computation - 101
5.3.1 Partitions Generated by Branching Processes - 102
5.3.2 Factorization - 102
5.3.3 Numerical Consistency of the Base Kernel - 103
5.4 Experiments - 104
| ID Code: | 1823 |
|---|---|
| Deposited By: | Céline Gueguen |
| Deposited On: | 30 June 2006 |
Repository Staff Only: edit this item

