This paper proposes a method for creating a high quality collection of researchers’ homepages. The proposed method consists of three phases: rough filtering of the possible web p...
Search facilitated with agglomerative hierarchical clustering methods was studied in a collection of Finnish newspaper articles (N = 53,893). To allow quick experiments, clustering...
Tuomo Korenius, Jorma Laurikkala, Martti Juhola, K...
Dyadic data refers to a domain with two nite sets of objects in which observations are made for dyads, i.e., pairs with one element from either set. This type of data arises natur...
This paper describes a method for asking statistical questions about a large text corpus. We exemplify the method by addressing the question, "What percentage of Federal Regi...
In web search, recency ranking refers to ranking documents by relevance which takes freshness into account. In this paper, we propose a retrieval system which automatically detect...
Anlei Dong, Yi Chang, Zhaohui Zheng, Gilad Mishne,...