We present a highly accurate method for classifying web pages based on link percentage, which is the percentage of text characters that are parts of links normalized by the number...
Recent progress in information extraction technology has enabled a vast array of applications that rely on structured data that is embedded in natural-language text. In particular...
Semistructured data is characterized by the lack of any fixed and rigid schema, although typically the data hassomeimplicitstructure. While thelack offixedschemamakesextracting ...
We have used a general purpose data mining tool to determine whether we can find any ‘golden nuggets’ in the web access logs of a large academic web site. Our goal was to use...
Abstract. In this work we propose a fuzzy technique to compare XML documents belonging to a semi-structured flow and sharing a common vocabulary of tags. Our approach is based on t...
Paolo Ceravolo, Maria Cristina Nocerino, Marco Viv...