Gradient boosting is a flexible machine learning technique that produces accurate predictions by combining many weak learners. In this work, we investigate its use in two applications, where we show the advantage of loss functions that are designed specifically for optimizing application objectives. We also extend the original gradient boosting algorithm with Newton-Raphson method to speed up learning. In the experiments, we demonstrate that the use of gradient boosting and application specific loss functions results in a relative improvement of 0.8% over an 82.6% baseline on the CoNLL 2003 named entity recognition task. We also show that this novel framework is useful in identifying regions of high word error rate (WER) and can provide up to 20% relative improvement depending on the chosen operating point.
Bin Zhang, Abhinav Sethy, Tara N. Sainath, Bhuvana