Sciweavers

1260 search results - page 152 / 252
» Web Mining
Sort
View
HIS
2003
13 years 11 months ago
Evolving Better Stoplists for Document Clustering and Web Intelligence
: Text classification, document clustering and similar document analysis tasks are currently the subject of significant global research, since such areas underpin web intelligence,...
Mark P. Sinka, David Corne
CIKM
2009
Springer
14 years 5 months ago
Vetting the links of the web
Many web links mislead human surfers and automated crawlers because they point to changed content, out-of-date information, or invalid URLs. It is a particular problem for large, ...
Na Dai, Brian D. Davison
CICLING
2009
Springer
14 years 2 months ago
Language Identification on the Web: Extending the Dictionary Method
Abstract. Automated language identification of written text is a wellestablished research domain that has received considerable attention in the past. By now, efficient and effecti...
Radim Rehurek, Milan Kolkus
KDD
2002
ACM
293views Data Mining» more  KDD 2002»
14 years 10 months ago
Automatic Categorization of Web Pages and User Clustering with Mixtures of Hidden Markov Models
We propose mixtures of hidden Markov models for modelling clickstreams of web surfers. Hence, the page categorization is learned from the data without the need for a (possibly cumb...
Alexander Ypma, Tom Heskes
PKDD
2007
Springer
120views Data Mining» more  PKDD 2007»
14 years 4 months ago
Site-Independent Template-Block Detection
Detection of template and noise blocks in web pages is an important step in improving the performance of information retrieval and content extraction. Of the many approaches propos...
Aleksander Kolcz, Wen-tau Yih