Instance Reduction for Avoiding Overfitting in Decision Trees

Classification aims to categorize instances into their different
classes, decision boundary can be drawn halfway between two nearest
instances of different classes. Therefore, border instances are seen as
noisy instances or instances that do not agree with their neighbors. In
this study, the use of many instance reduction algorithms as pre-pruning
techniques to smooth decision trees’ decision boundaries is
investigated. Although this is a general approach that can be applied to
any machine learning algorithm, conducted experiments reported in this
work are limited to decision trees.

Conducted experiments prove that eliminating border instances
increases the classification accuracy of decision trees. Removing border
instances improves classification accuracy and produces smaller
decision trees, which means faster learning and classification. Noise
filters produced trees with balanced tree size and classification
accuracy as well as number of attributes. Although instance reducers,
DROP3 and DROP5 gave the smallest trees, but they have low
classification accuracy.

Reduced Error Pruning affected the size and the number of attributes
significantly; unfortunately, this is not true considering the
classification accuracy. Noise filters reduced the tree size and number
of attributes and improved classification accuracy. Using built-in
Reduced Error Pruning with noise filter optimized the learning process
but did not improve the classification accuracy.

In general, using noise filters reduction algorithms without pruning
outperformed other techniques in terms of classification accuracy. On
the other hand, pruning with and without prior use of reduction
algorithms produce the smallest trees with the smallest number of

According to the dataset size, results show that medium datasets
outperformed large and small datasets in terms of classification
accuracy. However, large datasets outperformed medium and small datasets
in terms of the tree size and the number of attributes.

All of the above conclusions are valid without and with deliberately
inserted noise, although the advantage of using noise filters was more
apparent with the increase in noise ratio as they show more resistance
to noise.

Other instance reduction algorithms may be tested in the future; also
other machine learning algorithms could be tested. This noise tested in
this research is classification noise; future research can evaluate the
effect of inserting attribute noise and the impact of missing values.
Our experiments rely heavily on the use of the Heterogeneous Value
Distance Metric (HVDM) distance function, and other distance functions
can be used, such as the ISCDM [11, 26].

Other datasets [27, 28] can be used in future studies with either decision trees or other machine learning techniques.

Feature selection based on correlation with the class but not with other features was proposed by hall [29].
This technique's effectiveness can be tested with the proposed use of
instance reduction techniques as pre-pruning, with or without
intentionally added noise.

Who Voted

Leave a comment