Topic segmentation and identification are often tackled as separate problems whereas they are both part of topic analysis. In this article, we study how topic identification can...
The WindowDiff evaluation measure [12] is becoming the standard criterion for evaluating text segmentation methods. Nevertheless, this metric is really not fair with regard to the...
Sylvain Lamprier, Tassadit Amghar, Bernard Levrat,...
Document similarity search (i.e. query by example) aims to retrieve a ranked list of documents similar to a query document in a text corpus or on the Web. Most existing approaches...
MMR (Maximum Marginal Relevance) is widely used in summarization for its simplicity and efficacy, and has been demonstrated to achieve comparable performance to other approaches ...
The problem of identifying approximately duplicate records in databases is an essential step for data cleaning and data integration processes. Most existing approaches have relied...