A regression tree learns by splitting the training set into subsets based on an attribute value test. This process is repeated on each derived subset in a recursive manner called recursive partitioning.
Classification and Regression Tree techniques have a number of advantages over many of those alternative techniques.
- Simple to understand and interpret.
- In most cases, the interpretation of results summarized in a tree is very simple. This simplicity is useful not only for purposes of rapid classification of new observations but can also often yield a much simpler "model" for explaining why observations are classified or predicted in a particular manner.
- Able to handle both numerical and categorical data.
- Other techniques are usually specialized in analyzing datasets that have only one type of variable.
- Tree methods are nonparametric and nonlinear.
- The final results of using tree methods for classification or regression can be summarized in a series of (usually few) logical if-then conditions (tree nodes). Therefore, there is no implicit assumption that the underlying relationships between the predictor variables and the dependent variable are linear, follow some specific non-linear link function, or that they are even monotonic in nature. Thus, tree methods are particularly well suited for data mining tasks, where there is often little a priori knowledge nor any coherent set of theories or predictions regarding which variables are related and how. In those types of data analytics, tree methods can often reveal simple relationships between just a few variables that could have easily gone unnoticed using other analytic techniques.
One major problem with classification and regression trees is their high variance. Often a small change in the data can result in a very different series of splits, making interpretation somewhat precarious. Besides, decision-tree learners can create over-complex trees that cause over-fitting. Mechanisms such as pruning are necessary to avoid this problem. Another limitation of trees is the lack of smoothness of the prediction surface.