For the management of digital document collections, automatic database analysis still has ties to deal with semantic queries and abstract concepts that users are looking for. Whenever interactive learning strategies may improve the results of the search, system performances still depend on the representation of the document collection. We introduce in this paper a weakly supervised optimization of a feature vectors set. According to an incomplete set of partial labels, the method improves the representation of the collection, even if the size, the number, and the structure of the concepts are unknown. Experiments have been carried out on synthetic and real data in order to validate our approach. Key words: similarity, semantic, concept, learning, statistical, kernel, retrieval