This paper presents a concept hierarchy-based approach to privacy preserving data collection for data mining called the P-level model. The P-level model allows data providers to d...
Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
Text documents often contain valuable structured data that is hidden in regular English sentences. This data is best exploited if available as a relational table that we could use...
Parameters of statistical distributions that are input to simulations are typically not known with certainty. For existing systems, or variations on existing systems, they are oft...
It is argued that digital libraries of the future will contain terabyte-scale collections of digital text and that full-text searching techniques will be required to operate over c...