
Member-only story
Understanding Gradient Boosting: A Data Scientist’s Guide
Gradient boosting machine (GBM) is one of the most significant advances in machine learning and data science that has enabled us as practitioners to use ensembles of models to best many domain-specific problems. While this tool is widely available in python packages like scikit-learn
and xgboost
, as a data scientist, we should always look into the theory and mathematics of the model instead of using it as a black box. In this blog, we will dive into the following areas:
- Different backing concepts of GBM
- Step-by-step illustrated to recreate GBM
- Pro’s and Con’s
Fundamentals of Gradient Boosting
1. Weak learners and ensemble learning
Weak learners and ensemble learning are the two key concepts that make gradient boosting work. A weak learner is a model that is only slightly better than random guessing. Combined with many other weak learners, they can form a robust ensemble model to make accurate predictions.
Too wordy, too complicated
Okay, imagine we are playing 10,000-piece jigsaw with two other friends. (They must be excellent friends to sign up for this) Each of us is responsible for piecing one of the…