Repetition of layout structure is prevalent in document images. In document design, such repetition conveys the underlying logical and functional structure of the data. For exampl...
A major difficulty for designing a document image segmentation methodology is the proper value selection for all involved parameters. This is usually done after experimentations o...
Dataset shift from the training data in a source domain to the data in a target domain poses a great challenge for many statistical learning methods. Most algorithms can be viewed ...
In this paper, we review two techniques for topic discovery in collections of text documents (Latent Semantic Indexing and K-Means clustering) and present how we integrated them in...
Information extraction deals with extracting entities (such as people,organizations or locations) and named relations between entities (such as "People born-in Country")...