Grid-based Indexing of a Newswire Corpus

16 years 22 hour ago

Download www.gridbus.org

In this paper we report experience in the use of computational grids in the domain of natural language processing, particularly in the area of information extraction, to create query indices for information retrieval tasks. Given the prevalence of large corpora in the natural language processing domain, computational grids offer signiﬁcant utility to researchers in the domain who are reaching the bounds of computational efﬁciency. We leverage the afﬁnities between the segmented data sources prevalent in natural language processing and the parallelisation model from the grid domain. The experiment reported here is a large-scale newswire corpus indexing task, with the goal to efﬁciently create a queryable index of the entire corpus. By parallelising the indexing task and executing it on an Australian computational grid, we observe overall performance improvement of a 2.26x speedup over the same experiment on a single computational node. In addition to reporting the raw performan...

Baden Hughes, Srikumar Venugopal, Rajkumar Buyya

Real-time Traffic

Computational Grid | GRID 2004 | Indexing Task | Natural Language Processing |

claim paper

Post Info
More Details (n/a)

Added	01 Jul 2010
Updated	01 Jul 2010
Type	Conference
Year	2004
Where	GRID
Authors	Baden Hughes, Srikumar Venugopal, Rajkumar Buyya

Comments (0)

Sciweavers

Grid-based Indexing of a Newswire Corpus

Computational Grid | GRID 2004 | Indexing Task | Natural Language Processing |

Explore & Download

Productivity Tools

Sciweavers