Sciweavers

53 search results - page 3 / 11
» Efficient linear text segmentation based on information retr...
Sort
View
IDEAL
2003
Springer
14 years 18 days ago
Towards a Terabyte Digital Library System
In China-US Million Book Digital Library, output of the digitalization process is more than one terabyte of text in OEB and PDF format. To access these data quickly and accurately,...
Hao Ding, Yun Lin, Bin Liu
WWW
2009
ACM
14 years 8 months ago
Extracting article text from the web with maximum subsequence segmentation
Much of the information on the Web is found in articles from online news outlets, magazines, encyclopedias, review collections, and other sources. However, extracting this content...
Jeff Pasternack, Dan Roth
CIKM
2006
Springer
13 years 11 months ago
A document-centric approach to static index pruning in text retrieval systems
We present a static index pruning method, to be used in ad-hoc document retrieval tasks, that follows a documentcentric approach to decide whether a posting for a given term shoul...
Stefan Büttcher, Charles L. A. Clarke
BMCBI
2006
127views more  BMCBI 2006»
13 years 7 months ago
Exploring supervised and unsupervised methods to detect topics in biomedical text
Background: Topic detection is a task that automatically identifies topics (e.g., "biochemistry" and "protein structure") in scientific articles based on infor...
Minsuk Lee, Weiqing Wang, Hong Yu
ICMLA
2008
13 years 8 months ago
Graph-Based Multilevel Dimensionality Reduction with Applications to Eigenfaces and Latent Semantic Indexing
Dimension reduction techniques have been successfully applied to face recognition and text information retrieval. The process can be time-consuming when the data set is large. Thi...
Sophia Sakellaridi, Haw-ren Fang, Yousef Saad