Monday, August 27, 2018

Classification and Regression Trees

Classification and Regression Trees also known as CART refers to decision tree algorithms that can be used for classification or regression predictive models. The models are obtained by recursively partitioning the data space and fitting a simple prediction model within each prediction. In other words creating a CART model involves selecting input variables and splitting points on those variables until a suitable tree is constructed.  The representation of CART model is decision tree. The good thing about CART in terms of data is that it does not require any special data preparation other than a good representation of the problem.

Classification trees are designed for dependent variables that take a finite number of unordered values, with prediction error measured in terms of misclassification cost.
For classification using CART algorithm Gini index function is used which provides an indication of how "pure" the leaf nodes are ( how mixed the training data assigned to each node is).

Regression trees are for designed for dependent variables that take continuous or ordered discrete values, with predication error typically measured by the squared difference between the observed and predicted values.

Advantages of CART

  • Simple to understand, interpret, visualize.
  • Decision trees implicitly perform variable screening or feature selection.
  • Can handle both numerical and categorical data. Can also handle multi-output problems.
  • Decision trees require relatively little effort from users for data preparation.
  • Nonlinear relationships between parameters do not affect tree performance.

Resources

Tuesday, August 7, 2018

Essential Machine Learning Algorithms

An algorithm must be seen to be believed."  ~Donal Knuth

For anyone new in data science the first problem they face is which algorithms to learn. There are a ton of machine learning algorithms that they can learn, but first they need to decide where to start? Here is a list to most essential machine learning algorithms to start with. This is not the most comprehensive list of algorithms but it's just enough to get you stared.

Supervised Learning Algorithms

Unsupervised Learning Algorithms

  1. Clustering
      2. Visualization and Dimensionality Reduction

Some Remarks on The Corrections by Jonathan Franzen

In 2001 when The Corrections was published it was regarded as the most important book of the 21st century. Some of it was due to the tim...