Sciweavers

489 search results - page 30 / 98
» Combining linguistic and spatial information for document an...
Sort
View
ICPR
2004
IEEE
14 years 9 months ago
Coordinate Systems Reconstruction for Graphical Documents by Hough-feature Clustering and Geometric Analysis
Two-dimensional and three-dimensional coordinate systems are the basic graphics symbols in many graphical documents. A robust coordinate system detection scheme is needed in order...
Chew Lim Tan, Yan Ping Zhou
IJCAI
2007
13 years 9 months ago
Pseudo-Aligned Multilingual Corpora
In machine translation, document alignment refers to finding correspondences between documents which are exact translations of each other. We define pseudo-alignment as the task...
Fernando Diaz, Donald Metzler
DAS
2008
Springer
13 years 9 months ago
A Fast Preprocessing Method for Table Boundary Detection: Narrowing Down the Sparse Lines Using Solely Coordinate Information
As the rapid growth of PDF document in digital libraries, recognizing the document structure and detecting specific document components are useful for document storage, classifica...
Ying Liu, Prasenjit Mitra, C. Lee Giles
CICLING
2005
Springer
14 years 1 months ago
Incremental Information Extraction Using Tree-Based Context Representations
Abstract. The purpose of information extraction (IE) is to find desired pieces of information in natural language texts and store them in a form that is suitable for automatic pro...
Christian Siefkes
DOCENG
2005
ACM
13 years 10 months ago
Injecting information into atomic units of text
This paper presents a new approach to text processing, based on textemes. These are atomic text units generalising the concepts of character and glyph by merging them in a common ...
Yannis Haralambous, Gábor Bella