This paper reports the estimated number of spam blogs in order to assess their current state in the blogosphere. To extract spam blogs, I developed a traversal method among co-cit...
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
We present a novel approach to recognizing Textual nt. Structural features are constructed from abstract tree descriptions, which are automatically extracted from syntactic depend...
Knowledge-sharing communities like Wikipedia and automated extraction methods like those of DBpedia enable the construction of large machine-processible knowledge bases with relat...
There is no blank to mark word boundaries in Chinese text. As a result, identifying words is difficult, because of segmentation ambiguities and occurrences of unknown words. Conve...