Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks

15 years 1 months ago

Download acl.eldoc.ub.rug.nl

The Varro toolkit is a system for identifying and counting a major class of regularity in treebanks and annotated natural language data in the form of treestructures: frequently recurring unordered subtrees. This software has been designed for use in linguistics to be maximally applicable to actually existing treebanks and other stores of tree-structurable natural language data. It minimizes memory use so that moderately large treebanks are tractable on commonly available computer hardware. This article introduces condensed canonically ordered trees as a data structure for efficiently discovering frequently recurring unordered subtrees. 1 Credits This research is supported by the AMASS++ Project1 directly funded by the Institute for the Promotion of Innovation by Science and Technology in Flanders (IWT) (SBO IWT 060051).

Scott Martens

Real-time Traffic

COLING 2010 | Computational Linguistics | Natural Language Data | Tree-structurable Natural Language | Unordered Subtrees |

claim paper

Post Info
More Details (n/a)

Added	13 May 2011
Updated	13 May 2011
Type	Journal
Year	2010
Where	COLING
Authors	Scott Martens

Comments (0)

Sciweavers

Varro: An Algorithm and Toolkit for Regular Structure Discovery in Treebanks

COLING 2010 | Computational Linguistics | Natural Language Data | Tree-structurable Natural Language | Unordered Subtrees |

Explore & Download

Productivity Tools

Sciweavers