Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

15 years 5 months ago

Download www.aclweb.org

Stochastic gradient descent (SGD) uses approximate gradients estimated from subsets of the training data and updates the parameters in an online fashion. This learning framework is attractive because it often requires much less training time in practice than batch training algorithms. However, L1-regularization, which is becoming popular in natural language processing because of its ability to produce compact models, cannot be efficiently applied in SGD training, due to the large dimensions of feature vectors and the fluctuations of approximate gradients. We present a simple method to solve these problems by penalizing the weights according to cumulative values for L1 penalty. We evaluate the effectiveness of our method in three applications: text chunking, named entity recognition, and part-of-speech tagging. Experimental results demonstrate that our method can produce compact and accurate models much more quickly than a state-of-the-art quasiNewton method for L1-regularized loglinea...

Yoshimasa Tsuruoka, Jun-ichi Tsujii, Sophia Anania

Real-time Traffic

ACL 2009 | Approximate Gradients | Computational Linguistics | Gradients | Training |

claim paper

Post Info
More Details (n/a)

Added	16 Feb 2011
Updated	16 Feb 2011
Type	Journal
Year	2009
Where	ACL
Authors	Yoshimasa Tsuruoka, Jun-ichi Tsujii, Sophia Ananiadou

Comments (0)

Sciweavers

Stochastic Gradient Descent Training for L1-regularized Log-linear Models with Cumulative Penalty

ACL 2009 | Approximate Gradients | Computational Linguistics | Gradients | Training |

Explore & Download

Productivity Tools

Sciweavers