Sciweavers

COLING
2010

Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks

13 years 7 months ago
Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks
The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees. 1 Credits This research is supported by the AMASS++ Project1 directly funded by the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT) (SBO IWT 060051).
Scott Martens
Added 13 May 2011
Updated 13 May 2011
Type Journal
Year 2010
Where COLING
Authors Scott Martens
Comments (0)