Monday, July 30, 2018

ML Quick Bites : XGBoost

XGBoost stands for eXtreme Gradient Boosting, it was developed by Tianqi Chen and now is part of a wider collection of open-source libraries developed by the Distributed Machine Learning Community (DMLC).

XGBoost is the implementation of gradient boosted decision trees designed for speed and performance.
Important features of implementation include handling of missing values (Sparse Aware), Block Structure to support parallelization in tree construction and the ability to fit and boost on new data added to a trained model (Continued Training).

Algorithm

It implements gradient boosted decision tree algorithm. Boosting is an ensemble technique where new models are added to correct the errors made by existing models. Models are added sequentially until no further improvements can be made.

Gradient boosting is an approach where new models are created that predict the errors of previous models and then added together to make the final prediction. It is called gradient boosting because it uses a gradient descent algorithm to minimize the loss when adding new models.

Tasks

1. Binary Classification
2. Multi-class Classification
3. Regression
4. Learning To Rank

Pros

1. Execution Speed
XGBoost is really fast compared to the other gradient boosting

2. Model Performance
XGBoost dominates structured or tabular datasets on classification and regression predictive modeling problems.

3. Handling Missing Values
XGBoost has an in-built routine to handle missing values. XGBoost tries different things as it encounters a missing value on each node and learns which path to take for missing values in future.

4. Built-in Cross Validation
XGBoost allows user to run a cross-validation at each iteration of the boosting process and  thus it is easy to get the exact optimum number of boosting iterations in a single run.

Learn more about XGBoost

1. A Gentle Introduction to XGBoost
2. XGBoost: A Scalable Tree Boosting System
3. Trevor Hastie - Gradient Boosting Machine Learning
4. How to Develop Your First XGBoost Model in Python with scikit-learn

No comments:

Post a Comment

Some Remarks on The Corrections by Jonathan Franzen

In 2001 when The Corrections was published it was regarded as the most important book of the 21st century. Some of it was due to the tim...