Mind the data skew: distributed inferencing by speeddating in elastic regions

16 years 1 months ago

Download www.few.vu.nl

Semantic Web data exhibits very skewed frequency distributions among terms. Eﬃcient large-scale distributed reasoning methods should maintain load-balance in the face of such highly skewed distribution of input data. We show that term-based partitioning, used by most distributed reasoning approaches, has limited scalability due to load-balancing problems. We address this problem with a method for data distribution based on clustering in elastic regions. Instead of assigning data to ﬁxed peers, data ﬂows semi-randomly in the network. Data items “speed-date” while being temporarily collocated in the same peer. We introduce a bias in the routing to allow semantically clustered neighborhoods to emerge. Our approach is self-organising, eﬃcient and does not require any central coordination. We have implemented this method on the MaRVIN platform and have performed experiments on large real-world datasets, using a cluster of up to 64 nodes. We compute the RDFS closure over diﬀer...

Spyros Kotoulas, Eyal Oren, Frank van Harmelen

Real-time Traffic

Internet Technology | Large-scale Distributed Reasoning | Semantic Web Data | Skewed Frequency Distributions | WWW 2010 |

claim paper

Post Info
More Details (n/a)

Added	14 May 2010
Updated	14 May 2010
Type	Conference
Year	2010
Where	WWW
Authors	Spyros Kotoulas, Eyal Oren, Frank van Harmelen

Comments (0)

Sciweavers

Mind the data skew: distributed inferencing by speeddating in elastic regions

Internet Technology | Large-scale Distributed Reasoning | Semantic Web Data | Skewed Frequency Distributions | WWW 2010 |

Explore & Download

Productivity Tools

Sciweavers