Chotteau, Christophe (2003) Corrélation sémantique entre documents: application à la recherche d'information juriqique sur le Web. PhD thesis Informatique temps réel, robotique et automatique, Centre de recherche en informatique, ENSMP.
Full text available as:
|
|
Alternative Locations: http://cri.ensmp.fr/classement/doc/A-353.ps
Abstract
There are many ways to find information on the Web and search engines are the most frequently used tools. In this context, relative pages algorithms are complementary techniques providing more information about on specific document without asking any question. The goal of our work is to define a new semantic relative page algorithm to perform search on a law oriented corpus.
To reach the goal, we defined a method that applies linguistic tools and techniques on previously selected documents. Relevant text units are extracted from our documents'corpus and are called lexical signatures. We use those lexical signatures as requests to search engine; the results correspond to the pool of relative pages. Our relative pages algorithm is used and evaluated in an information retrieval context, being included in the development of a search engine.
The main contribution of our work are (1) a new perspective for building lexical signatures to perform relative pages searches, (2) the definition and evaluation of a new relative pages algorithm calles Tifr, (3) a discussion on the semantic aspect of our method and finally, (4) a practical answer to the challenge of information retrieval in a law oriented context.
| Item Type: | PhD Thesis (PhD) |
|---|---|
| Thesis Supervisor: | Mahl, Robert |
| Date: | December 2003 |
| Board of examiners: | Constant, P. and Girardot, J.j and Mahl, Robert and Roche, C. and Zweigenbaum, P. |
| Discipline: | Informatique temps réel, robotique et automatique |
| Collection (Fonds): | ENSMP |
| Institution: | ENSMP |
| Department: | Centre de recherche en informatique |
| Subjects: | 2. Information and Communication Sciences and Technologies |
| Uncontrolled Keywords: | Information retrieval, Relative pages algorithms, Lexical signatures, Tifr weight, Ingénierie des connaissances, Recherche d'information, Corrélation de documents, Signature lexicale, Pondération statistique, indice Tifr |
| ID Code: | 1080 |
|---|---|
| Deposited By: | Francine Masson |
| Deposited On: | 28 February 2005 |
Repository Staff Only: edit this item

