We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
The nearest-neighbor based document skew detection methods do not require the presence of a predominant text area, and are not subject to skew angle limitation. However, the accur...
Abstract. Textual Entailment has recently been proposed as an application independent task of recognising whether the meaning of one text may be inferred from another. This is pote...
Any large language processing software relies in its operation on heuristic decisions concerning the strategy of processing. These decisions are usually "hard-wired" int...
In e-business development, semantics-oriented document exchange is becoming important, because it can support crossdomain user connection, business transaction and collaboration. ...