Detecting nepotistic links by language model disagreement

16 years 7 months ago

Download www2006.org

In this short note we demonstrate the applicability of hyperlink downweighting by means of language model disagreement. The method filters out hyperlinks with no relevance to the target page without the need of white and blacklists or human interaction. We fight various forms of nepotism such as common maintainers, ads, link exchanges or misused affiliate programs. Our method is tested on a 31 M page crawl of the .de domain with a manually classified 1000-page random sample. Categories and Subject Descriptors: H.3.3 [Information Storage and Retrieval], I.7.5 [Document Capture]: Document analysis General Terms: Algorithms, Measurement, Experimentation

András A. Benczúr, István B&i

Real-time Traffic

1000-page Random Sample | Analysis General Terms | Internet Technology | Language Model Disagreement | WWW 2006 |

claim paper

» Modeling SocioCultural Phenomena in Discourse

» Semantic Language Models for Topic Detection and Tracking

» NLP and IR Approaches to Monolingual and Multilingual Link Detection

» Crosslanguage linking of news stories on the web using interlingual topic modelling

» Combining anchor text categorization and graph analysis for paid link detection

» Modeling User Behaviour Aware WebSites with PRML

» Tracking news stories across different sources

» McErlang a model checker for a distributed functional programming language

Post Info
More Details (n/a)

Added	22 Nov 2009
Updated	22 Nov 2009
Type	Conference
Year	2006
Where	WWW
Authors	András A. Benczúr, István Bíró, Károly Csalogány, Máté Uher

Comments (0)

Sciweavers

Detecting nepotistic links by language model disagreement

1000-page Random Sample | Analysis General Terms | Internet Technology | Language Model Disagreement | WWW 2006 |

Explore & Download

Productivity Tools

Sciweavers