In this paper we describe our approach to reconstructing the software architecture of J2EE web applications. We use the Siemens Four Views approach, separating the architecture in...
We present experiments in automatic genre classification on web corpora, comparing a wide variety of features on several different genreannotated datasets (HGC, I-EN, KI-04, KRYS...
The technical and competence requirements for writing content on the web is still one of the major factors that widens the gap between authors and readers. Although tools that sup...
Thereis a wealthof informationto be minedfromnarrative text on the WorldWideWeb.Unfortunately, standard natural language processing (NLP)extraction techniques expect full, grammat...
CzEng 0.9 is the third release of a large parallel corpus of Czech and English. For the current release, CzEng was extended by significant amount of texts from various types of so...