Parameterized generation of labeled datasets for text categorization based on a hierarchical directory