Proposals for text classification and information retrieval have been recently presented making use of the WordNet ontology. Generally, this methodology requires statistical induc...
Background: Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on infor...
Instant intercommunion techniques such as Instant Messaging (IM) are widely popularized. Aiming at such kind of large scale masscommunication media, clustering on its text conte...
– We describe a method to extract content text from diverse Web pages by using the HTML document’s Text-to-Tag Ratio rather than specific HTML cues that may not be constant acr...
In this paper, we propose a practical approach for extracting the most relevant paragraphs from the original document to form a summary for Thai text. The idea of our approach is ...