Decision Trees are a fundamental machine learning algorithm used for both classification and regression tasks. Understanding their characteristics, capabilities, and limitations is crucial for effectively applying them to solve real-world problems.
Question:
Which of the following statements are true regarding the properties and behavior of Decision Trees?
Statements to Evaluate:
1. Decision tree makes no assumptions about the data. 2. The decision tree model can learn non-linear decision boundaries. 3. Decision trees cannot explain how the target will change if a variable is changed by 1 unit (marginal effect). 4. Hyperparameter tuning is not required in decision trees. 5. In a decision tree, increasing entropy implies increasing purity. 6. In a decision tree, the entropy of a node decreases as we go down the decision tree.
As you know, Decision
Tree is all about splitting nodes at different levels and trying to classify
accurately as much as possible.
You are given a feature
(1-d array) and label (1-d array) (target) where you have to determine which
value in the corresponding feature is best to split upon at the first root
level for building a decision tree.
The feature would be
having continuous values whereas the target is binary in nature. So, The main
task is to determine which value/threshold is best to split upon considering
the classification task taking the loss as entropy and maximizing Information Gain.
float value determining best threshold for decision tree classification
'''
best_threshold=None
best_info_gain = -1
# For every unique value of that feature
for threshold in np.unique(features):
y_left = labels[features <= threshold]#list of labels in left child
y_right = labels[features > threshold]#list of labels in right child
iflen(y_left) > 0andlen(y_right) > 0:
gain = information_gain(labels, y_left, y_right)# Caclulate the information gain and save the split parameters if the current split if better then the previous best