Sciweavers

317 search results - page 15 / 64
» Style-independent document labeling: design and performance ...
Sort
View
SIGIR
2004
ACM
14 years 1 months ago
Parameterized generation of labeled datasets for text categorization based on a hierarchical directory
Although text categorization is a burgeoning area of IR research, readily available test collections in this field are surprisingly scarce. We describe a methodology and system (...
Dmitry Davidov, Evgeniy Gabrilovich, Shaul Markovi...
IJCAI
2007
13 years 9 months ago
Semantic Smoothing of Document Models for Agglomerative Clustering
In this paper, we argue that the agglomerative clustering with vector cosine similarity measure performs poorly due to two reasons. First, the nearest neighbors of a document belo...
Xiaohua Zhou, Xiaodan Zhang, Xiaohua Hu
CIKM
2008
Springer
13 years 9 months ago
Identifying table boundaries in digital documents via sparse line detection
Most prior work on information extraction has focused on extracting information from text in digital documents. However, often, the most important information being reported in an...
Ying Liu, Prasenjit Mitra, C. Lee Giles
EMNLP
2008
13 years 9 months ago
Who is Who and What is What: Experiments in Cross-Document Co-Reference
This paper describes a language-independent, scalable system for both challenges of crossdocument co-reference: name variation and entity disambiguation. We provide system results...
Alex Baron, Marjorie Freedman
PVLDB
2010
168views more  PVLDB 2010»
13 years 6 months ago
Transforming XML Documents as Schemas Evolve
Database systems often use XML schema to describe the format of valid XML documents. Usually, this format is determined when the system is designed. Sometimes, in an already funct...
Jarek Gryz, Marcin Kwietniewski, Stephanie Hazlewo...