Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

14 years 8 days ago

Download www.cis.temple.edu

A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search engines today. However, this representation involves a key oversimplification of the true complexity of the Web: an edge in the traditional Web graph represents only the existence of a hyperlink; information on the context (e.g., informational, adversarial, commercial, spam) behind the hyperlink is absent. In this work-in-progress paper, we describe an ongoing collaborative project between two teams, one specializing in network science and analysis and the other specializing in text analysis and machine learning, to address this oversimplification. Using techniques in natural language processing, text mining and machine learning to extract relevant features of hyperlinks and classify them into one of several types, this undertaking builds and analyzes a multi-relational web graph. A key aspect of this work i...

Harish Sethu, Alexander Yates

Real-time Traffic

Hyperlinks | Machine Learning | Security Privacy | SOCIALCOM 2010 | Web Graph |

claim paper

Post Info
More Details (n/a)

Added	15 Feb 2011
Updated	15 Feb 2011
Type	Journal
Year	2010
Where	SOCIALCOM
Authors	Harish Sethu, Alexander Yates

Comments (0)

Sciweavers

Using Text Analysis to Understand the Structure and Dynamics of the World Wide Web as a Multi-Relational Graph

Hyperlinks | Machine Learning | Security Privacy | SOCIALCOM 2010 | Web Graph |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers