Sciweavers

92 search results - page 14 / 19
» HTML Pattern Generator--Automatic Data Extraction from Web P...
Sort
View
WWW
2010
ACM
13 years 7 months ago
Exploiting content redundancy for web information extraction
We propose a novel extraction approach that exploits content redundancy on the web to extract structured data from template-based web sites. We start by populating a seed database...
Pankaj Gulhane, Rajeev Rastogi, Srinivasan H. Seng...
WWW
2006
ACM
14 years 8 months ago
GoGetIt!: a tool for generating structure-driven web crawlers
We present GoGetIt!, a tool for generating structure-driven crawlers that requires a minimum effort from the users. The tool takes as input a sample page and an entry point to a W...
Altigran Soares da Silva, Edleno Silva de Moura, J...
SIGMOD
2007
ACM
188views Database» more  SIGMOD 2007»
14 years 7 months ago
Intel Mash Maker: join the web
Intel? Mash Maker is an interactive tool that tracks what the user is doing and tries to infer what information and visualizations they might find useful for their current task. M...
Robert Ennals, Eric A. Brewer, Minos N. Garofalaki...
WCRE
1999
IEEE
13 years 11 months ago
Chava: Reverse Engineering and Tracking of Java Applets
Java applets have been used increasingly on web sites to perform client-side processing and provide dynamic content. While many web site analysis tools are available, their focus ...
Jeffrey L. Korn, Yih-Farn Chen, Eleftherios Koutso...
ADC
2005
Springer
183views Database» more  ADC 2005»
14 years 1 months ago
Discovering User Access Pattern Based on Probabilistic Latent Factor Model
There has been an increased demand for characterizing user access patterns using web mining techniques since the informative knowledge extracted from web server log files can not ...
Guandong Xu, Yanchun Zhang, Jiangang Ma, Xiaofang ...