Current approaches to script identification rely on hand-selected features and often require processing a significant part of the document to achieve reliable identification. We p...
We introduce a simple and efficient method for clustering and identifying temporal trends in hyper-linked document databases. Our method can scale to large datasets because it ex...
Alexandrin Popescul, Gary William Flake, Steve Law...
: The quality of business software is more and more becoming a competitive factor. As complete testing is impossible, testers have to make decisions, e.g. to choose which parts of ...
As a sequence of two or more consecutive individual words inherent with contextual semantics of individual words, multi-word attracts much attention from statistical linguistics an...
Document collections evolve over time, new topics emerge and old ones decline. At the same time, the terminology evolves as well. Much literature is devoted to topic evolution in ...