"Classification of texts for the automatic detection of themes in a system of sorts out automatic of complaints"
U.F.R of the Sciences and Techniques, University of Rouen
Supervisors
M. Thierry PAQUET
Professor, University of Rouen France
and
M. Laurent HEUTTE
HDR, University of Rouen France
The post-sale service or the client service of an enterprise is solicited more and more by the clientele, through the intermediary of handwritten mail notably. In general, every mail includes the information that relate to a very specific theme and it is necessary to the enterprise, before the treatment of this mail, to regroup them by themes. Out the regrouping of this mail by human operators is very long and trying. The classification of this mail would be a big advantage.
The goal of the
project is to achieve a module of automatic classification of texts for the
detection of theme in a system of automatic sorting of the mail all coming. In a
first stage, we will make a bibliographic survey on the existing methods of
treatment of information in general and in particular the methods of text
categorization. Then, we will study the system of labeling of SWT that we will
use to extract the documents that are going to serve to the training of our
classifier. The classifier that we will adopt, will not only be tested on the
basis of document of SWT but also on the basis of Reuters-21578 documents, that
is a free and available basis on the clean. We go in first time to consider the
vocabulary provided by SWT like space of characteristic. Then, we will apply
methods of extractions of feature on the documents to extract another vocabulary
that either more discriminating than the first.