Scalable Text Retrieval for Large Digital Libraries

14 years 7 months ago

Download david-hawking.net

It is argued that digital libraries of the future will contain terabyte-scale collections of digital text and that full-text searching techniques will be required to operate over collections of this magnitude. Algorithms expected to be capable of scaling to these data sizes using clusters of modern workstations are described. First, basic indexing and retrieval algorithms operating at performance levels comparable to other leading systems over gigabytes of text on a single workstation are presented. Next, simple mechanisms for extending query processing capacity to much greater collection sizes are presented, to tens of gigabytes for single workstations and to terabytes for clusters of such workstations. Query-processing eciency on a single workstation is shown to deteriorate dramatically when data size is increased above a certain multiple of physical memory size. By contrast, the number of clustered workstations necessary to maintain a constant level of service increases linearly wi...

David Hawking

Real-time Traffic

Data Size | Education | ERCIMDL 1997 | Single Workstation | Workstation |

claim paper

Post Info
More Details (n/a)

Added	07 Aug 2010
Updated	07 Aug 2010
Type	Conference
Year	1997
Where	ERCIMDL
Authors	David Hawking

Comments (0)

Sciweavers

Scalable Text Retrieval for Large Digital Libraries

Data Size | Education | ERCIMDL 1997 | Single Workstation | Workstation |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers