We present a novel model for validating and improving the content and structure organization of a website. This model studies the website as a graph and evaluates its interconnect...
High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantifi...
Satya Sanket Sahoo, Christopher Thomas, Amit P. Sh...
Determining the similarity of short text snippets, such as search queries, works poorly with traditional document similarity measures (e.g., cosine), since there are often few, if...
Wide-area distributed applications are challenging to debug, optimize, and maintain. We present Wide-Area Project 5 (WAP5), which aims to make these tasks easier by exposing the c...
Patrick Reynolds, Janet L. Wiener, Jeffrey C. Mogu...
Today web search engines provide the easiest way to reach information on the web. In this scenario, more than 95% of Indian language content on the web is not searchable due to mu...
In this paper we describe preliminary work that examines whether statistical properties of the structure of websites can be an informative measure of their quality. We aim to deve...
Vaclav Petricek, Tobias Escher, Ingemar J. Cox, He...
Content personalization is a very important aspect in the field of e-learning, although current standards do not fully support it. In this paper, we outline an extension to the AD...
In this paper, we propose the use of records management principles to identify and manage Web site resources with enduring value as records. Current Web archiving activities, coll...
Currently, there is an increasing effort to provide various personalized services on museum web sites. This paper presents an approach for determining user interests in a museum c...
Classical logics and Datalog-related logics have both been proposed as underlying formalisms for the Semantic Web. Although these two different formalism groups have some commonal...