"Classification of texts for the automatic detection of themes in a system of sorts out automatic of complaints"

U.F.R of the Sciences and Techniques, University of Rouen

Supervisors

M. Thierry PAQUET

Professor, University of Rouen France

and

M. Laurent HEUTTE

HDR, University of Rouen France

 

The post-sale service or the client service of an enterprise is solicited more and more by the clientele, through the intermediary of handwritten mail notably. In general, every mail includes the information that relate to a very specific theme and it is necessary to the enterprise, before the treatment of this mail, to regroup them by themes. Out the regrouping of this mail by human operators is very long and trying. The classification of this mail would be a big advantage.

The goal of the project is to achieve a module of automatic classification of texts for the detection of theme in a system of automatic sorting of the mail all coming. In a first stage, we will make a bibliographic survey on the existing methods of treatment of information in general and in particular the methods of text categorization. Then, we will study the system of labeling of SWT that we will use to extract the documents that are going to serve to the training of our classifier. The classifier that we will adopt, will not only be tested on the basis of document of SWT but also on the basis of Reuters-21578 documents, that is a free and available basis on the clean. We go in first time to consider the vocabulary provided by SWT like space of characteristic. Then, we will apply methods of extractions of feature on the documents to extract another vocabulary that either more discriminating than the first.
 

Capturé par MemoWeb ŕ partir de http://www.regim.org/Membres/~koubaa_mohamed/abstract/abstract_m.htm  le 20/01/2010