We consider the problem of modeling the content structure of texts within a specific domain, in terms of the topics the texts address and the order in which these topics appear. W...
Annotating genes and their products with Gene Ontology codes is an important area of research. One approach for doing this is to use the information available about these genes in...
Most traditional text clustering methods are based on "bag of words" (BOW) representation based on frequency statistics in a set of documents. BOW, however, ignores the ...
Jian Hu, Lujun Fang, Yang Cao, Hua-Jun Zeng, Hua L...
The World-Wide Web consists not only of a huge number of unstructured texts, but also a vast amount of valuable structured data. Web tables [2] are a typical type of structured in...
Cindy Xide Lin, Bo Zhao, Tim Weninger, Jiawei Han,...
There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g....