Sciweavers

849 search results - page 21 / 170
» Modeling Content Identification from Document Images
Sort
View
ICDAR
2011
IEEE
12 years 7 months ago
OCR-Driven Writer Identification and Adaptation in an HMM Handwriting Recognition System
—We present an OCR-driven writer identification algorithm in this paper. Our algorithm learns writer-specific characteristics more precisely from explicit character alignment usi...
Huaigu Cao, Rohit Prasad, Prem Natarajan
RIAO
2007
13 years 9 months ago
From Layout to Semantic: a Reranking Model for Mapping Web Documents to Mediated XML Representations
Many documents on the Web are formated in a weakly structured format. Because of their weak semantic and because of the heterogeneity of their formats, the information conveyed by...
Guillaume Wisniewski, Patrick Gallinari
TSMC
2008
162views more  TSMC 2008»
13 years 7 months ago
A New Model for Secure Dissemination of XML Content
Abstract--The paper proposes an approach to content dissemination that exploits the structural properties of an Extensible Markup Language (XML) document object model in order to p...
Ashish Kundu, Elisa Bertino
DEXAW
2010
IEEE
202views Database» more  DEXAW 2010»
13 years 8 months ago
Identifying Sentence-Level Semantic Content Units with Topic Models
Abstract--Statistical approaches to document content modeling typically focus either on broad topics or on discourselevel subtopics of a text. We present an analysis of the perform...
Leonhard Hennig, Thomas Strecker, Sascha Narr, Ern...
DAS
2006
Springer
13 years 9 months ago
XCDF: A Canonical and Structured Document Format
Accessing the structured content of PDF document is a difficult task, requiring pre-processing and reverse engineering techniques. In this paper, we first present different methods...
Jean-Luc Bloechle, Maurizio Rigamonti, Karim Hadja...