Over the past decade the Internet has evolved into the largest public community in the world. It provides a wealth of data content and services in almost every field of science, technology, medicine, business, leisure, and education just to name a few. However, this exponential growth came at the price of increased complexity for the end-user to categorize, prioritize, and select in a customizable way the information and services that are provided by millions of Web sites across the Internet. This paper presents the i-Cube environment, a toolset that allows for Internet data and content originally available as HTML Web pages and programmatic scripts to be denoted, modeled, and represented in the form of XML documents. These XML documents conform to specific Document Type Definitions and other structural constraints that are fully customizable by the end-user or the service provider. The approach is based on representing HTML document data content in the form of annotated trees. Specif...