A single classification tree can have high variance; that is,
different subsets of data can lead to very different trees. Bagging –
short for “bootstrap aggregating” – can decrease that variance by
combining the results of multiple classification trees. In other words,
the average of multiple “guesses” tends to be closer to the real answer
than a single “guess”. The bagging procedure is as follows:
1. Take a random sample of the data with replacement (the
“bootstrap”)
2. Create a full unpruned tree using that sample, using all
predictors
3. Record the predicted outcomes from the tree; that is, the predicted
class of each observation
4. Repeat for the number of trees specified by the user, e.g. a bagged
model could have 10 or 100 trees
5. For a binary outcome, take the majority vote across the trees’
predictions; that is, if 7/10 trees say that observation 200 should be
Class A, then that is the prediction for observation 200 (the
“aggregating”)
To find out how many repetitions of the bagging procedure are appropriate (i.e. how many individual classification trees should be created and aggregated into one model), models with different numbers of trees can be built, and the error measured. Too many trees won’t lead to overfitting but will take unnecessary computing time, so it’s better to use the smallest number of trees that leads to low error.
The plot below shows that classification accuracy for this data increases as the number of trees in a bagged model increases, up to a point (the black line is mean accuracy across different subsets of data, and the grey lines are +/- 1 SD). Accuracy plateaus at about 250 trees, so 250 trees is a reasonable number to use.
The final model consisted of 250 individual trees, each allowed to grow to its full depth. For each tree, the same parameters were used as in the single classification tree: a minsplit of 20 observations for splitting a node, and split = “gini”. Each tree was built from a bootstrapped sample of n=753, randomly sampled with replacement from the original dataset of n=753. The bootstrapping process means that each tree was built on roughly 2/3 of the data. Downsampling was applied to the majority class (“No”) so that each tree was built using an equal number of “Yes” and “No” cases. No cross-validation procedure was used in this final model, as the dataset was already pretty small and individual trees were built using different subsets of the data anyway.
The table below shows the final model’s confusion matrix of correctly and incorrectly predicted cases. The model’s classification accuracy was 77.42% overall, with 32.21% sensitivity and 88.58% specificity.
## Actual No Actual Yes
## Predicted No 535 101
## Predicted Yes 69 48
Variable importance was calculated similarly as in the single classification tree, except it’s now summed across 250 trees in the model. In the plot below, importance was scaled relative to the most important variable (HHIE).
A bagged model made of 250 trees appeared to be more accurate than a single tree, but that was because it skewed more towards “No” cases (which made up the vast majority of cases), despite the use of downsampling.
HHIE, Age, and PTA(BE) were the most important variables in a bagged model, similar to a single classification tree.
| Accuracy % | Sensitivity % | Specificity % | Area Under Curve | |
|---|---|---|---|---|
| Logistic x=4 | 63.47 | 59.73 | 64.4 | 62.07 |
| Class tree cp=0.013 | 67.46 | 78.52 | 64.74 | 71.63 |
| Bagging (250 trees) | 77.42 | 32.21 | 88.58 | 60.4 |
| Random forest | * | * | * | * |
| Boosting | * | * | * | * |