Abstract. For the 2008 participation at GeoCLEF, we focused on improving the extraction of geographic signatures from documents and optimising their use for GIR. The results show t...
A long-standing goal of Web research has been to construct a unified Web knowledge base. Information extraction techniques have shown good results on Web inputs, but even most dom...
Michael J. Cafarella, Jayant Madhavan, Alon Y. Hal...
This paper reports the estimated number of spam blogs in order to assess their current state in the blogosphere. To extract spam blogs, I developed a traversal method among co-cit...
The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often c...
David Pinto, Andrew McCallum, Xing Wei, W. Bruce C...
Annotating training data for event extraction is tedious and labor-intensive. Most current event extraction tasks rely on hundreds of annotated documents, but this is often not en...