Sciweavers

1018 search results - page 129 / 204
» Document Representation in Natural Language Text Retrieval
Sort
View
DIS
2007
Springer
15 years 10 months ago
Unsupervised Spam Detection Based on String Alienness Measures
We propose an unsupervised method for detecting spam documents from Web page data, based on equivalence relations on strings. We propose 3 measures for quantifying the alienness (...
Kazuyuki Narisawa, Hideo Bannai, Kohei Hatano, Mas...
TDM
2004
202views Database» more  TDM 2004»
15 years 5 months ago
Combining Indexing Schemes to Accelerate Querying XML on Content and Structure
This paper presents the advantages of combining multiple document representation schemes for query processing of XML queries on content and structure. We show how extending the Te...
Georgina Ramírez, Arjen P. de Vries
LREC
2008
111views Education» more  LREC 2008»
15 years 6 months ago
Low-Density Language Bootstrapping: the Case of Tajiki Persian
Low-density languages raise difficulties for standard approaches to natural language processing that depend on large online corpora. Using Persian as a case study, we propose a no...
Karine Megerdoomian, Dan Parvaz
ECIS
2003
15 years 5 months ago
Hybrid XML data model architecture for efficient document management
XML has been known as a document standard in representation and exchange of data on the Internet, and is also used as a standard language for the search and reuse of scattered doc...
Eun-Young Kim, Jin-Ho Choi, Jhung-Soo Hong, Tae-Hu...
KDD
2003
ACM
128views Data Mining» more  KDD 2003»
16 years 4 months ago
Similarity analysis on government regulations
Government regulations are semi-structured text documents that are often voluminous, heavily cross-referenced between provisions and even ambiguous. Multiple sources of regulation...
Gloria T. Lau, Kincho H. Law, Gio Wiederhold