Feature selection is an important task in data mining and machine learning processes. This task is well known in both supervised and unsupervised contexts. The semi-supervised feature selection is still under development and far from being mature. In general, machine learning has been well developed in order to deal with partially-labeled data. Thus, feature selection has obtained special importance in the semi-supervised context. It became more adapted with the real world applications where labeling process is costly to obtain. In this thesis, we present a literature review on semi-supervised feature selection, with regard to supervised and unsupervised contexts. The goal is to show the importance of compromising between the structure from unlabeled part of data, and the background information from their labeled part. In particular, we are interested in the so-called «small labeled-sample problem» where the difference between both data parts is very important. In order to deal with the problem of semi-supervised feature selection, we propose two groups of approaches. The first group is of «Filter» type, in which, we propose some algorithms which evaluate the relevance of features by a scoring function. In our case, this function is based on spectral-graph theory and the integration of pairwise constraints which can be extracted from the data in hand. The second group of methods is of «Embedded» type, where feature selection becomes an internal function integrated in the learning process. In order to realize embedded feature selection, we propose algorithms based on feature weighting. The proposed methods rely on constrained clustering. In this sense, we propose two visions, (1) a global vision, based on relaxed satisfaction of pairwise constraints. This is done by integrating the constraints in the objective function of the proposed clustering model; and (2) a second vision, which is local and based on strict control of constraint violation. Both approaches evaluate the relevance of features by weights which ...
|