Stochastic gradient boosting
Section snippets
Gradient boosting
In the function estimation problem one has a system consisting of a random “output” or “response” variable y and a set of random “input” or “explanatory” variables . Given a “training” sample of known values, the goal is to find a function that maps to y, such that over the joint distribution of all values, the expected value of some specified loss function is minimized
Boosting approximates by an “additive”
Stochastic gradient boosting
With his “bagging” procedure, Breiman (1996) introduced the notion that injecting randomness into function estimation procedures could improve their performance. Early implementations of AdaBoost (Freund and Schapire, 1996) also employed random sampling, but this was considered an approximation to deterministic weighting when the implementation of the base learner did not support observation weights, rather than as an essential ingredient. Recently, Breiman (1999) proposed a hybrid bagging
Simulation studies
The effect of randomization on gradient tree_boost procedures will likely depend on the particular problem at hand. Important characteristics of problems that affect performance include training sample size N, true underlying “target” function (1), and the distribution of the departures, ε, of from . In order to gauge the value of any estimation method it is necessary to accurately evaluate its performance over many different situations. This is most conveniently accomplished
Discussion
The results of the previous section indicate that the accuracy of gradient boosting can be substantially improved by introducing randomization through the simple expedient of training the base learner on different randomly selected data subsets at each iteration. The degree of improvement is seen to depend on the particular problem at hand in terms of the training sample size N, the true underlying target function (1), the distribution of (Gaussian, slash, Bernoulli), and the capacity
Acknowledgements
Helpful discussions with Leo Breiman are gratefully acknowledged. This work was partially supported by CSIRO Mathematical and Information Sciences, Australia, the Department of Energy under contract DE-AC03-76SF00515, and by Grant DMS9764431 of the National Science Foundation.
References (5)
Bagging predictors
Mach. Learning
(1996)- Breiman, L., 1999. Using adaptive bagging to debias regressions. Technical Report, Department of Statistics, University...
Cited by (5176)
Automated machine learning in nanotoxicity assessment: A comparative study of predictive model performance
2024, Computational and Structural Biotechnology JournalA novel approach for classification of Tor and non-Tor traffic using efficient feature selection methods
2024, Expert Systems with ApplicationsMachine learning assisted QSAR analysis to predict protein adsorption capacities on mixed-mode resins
2024, Separation and Purification TechnologyHeterojunction of MXenes and MN<inf>4</inf>–graphene: Machine learning to accelerate the design of bifunctional oxygen electrocatalysts
2024, Journal of Colloid and Interface ScienceA comparative analysis of boosting algorithms for chronic liver disease prediction
2024, Healthcare Analytics