Machine Learning - Deep Learning: Entropy

Showing posts with label Entropy. Show all posts

Machine Learning Programs

👉Data Preprocessing in Machine Learning

👉Data Preprocessing in Machine learning (Handling Missing values )

👉Linear Regression - ML Program - Weight Prediction

👉Naïve Bayes Classifier - ML Program

👉LOGISTIC REGRESSION - PROGRAM

👉KNN Machine Learning Program

👉Support Vector Machine (SVM) - ML Program

👉Decision Tree Classifier on Iris Dataset

👉Classification of Iris flowers using Random Forest

👉DBSCAN

👉 Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of training data samples. Read the training data from a .CSV file

👉For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the training examples.

👉Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

👉Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same using appropriate data sets.

👉Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate the diagnosis of heart patients using a standard Heart Disease Data Set.

👉Write a program to implement k-Nearest Neighbors algorithm to classify the iris data set. Print both correct and wrong predictions.

👉Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select appropriate data set for your experiment and draw graphs.

👉Write a program to implement SVM algorithm to classify the iris data set. Print both correct and wrong predictions.

👉Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of clustering.

👉 Write a program using scikit-learn to implement K-means Clustering

👉Program to calculate the entropy and the information gain

👉Program to implement perceptron.

Decision Trees are a fundamental machine learning algorithm used for both classification and regression tasks. Understanding their characteristics, capabilities, and limitations is crucial for effectively applying them to solve real-world problems.

Question:

Which of the following statements are true regarding the properties and behavior of Decision Trees?

Statements to Evaluate:

1. Decision tree makes no assumptions about the data.
2. The decision tree model can learn non-linear decision boundaries.
3. Decision trees cannot explain how the target will change if a variable is changed by 1 unit (marginal effect).
4. Hyperparameter tuning is not required in decision trees.
5. In a decision tree, increasing entropy implies increasing purity.
6. In a decision tree, the entropy of a node decreases as we go down the decision tree.

Choose the correct answer from below:

A) 1, 2, and 5

B) 3, 5 and 6

C) 2, 3, 4 and 5

D) 1,2,3 and 6

Ans: D 1, 2, 3 and 6

Decision Tree Classification _ Program

Q. Decision Tree Classification

Problem Description

As you know, Decision Tree is all about splitting nodes at different levels and trying to classify accurately as much as possible.

You are given a feature (1-d array) and label (1-d array) (target) where you have to determine which value in the corresponding feature is best to split upon at the first root level for building a decision tree.

The feature would be having continuous values whereas the target is binary in nature. So, The main task is to determine which value/threshold is best to split upon considering the classification task taking the loss as entropy and maximizing Information Gain.

Input Format

Two inputs:

1. 1-d array of feature

2. 1-d array of label

Output Format

Return threshold value

Example Input

feature: [0.58 0.9 0.45 0.18 0.5 0.12 0.31 0.09 0.24 0.83]

label: [1 0 0 0 0 0 1 0 1 1]

Example Output

0.18

Example Explanation

If you calculate Information Gain for all of the feature values, it would be computed as : (threshold, Information Gain)

(0.09 0.08) (0.12 0.17) (0.18 0.28) (0.24 0.05) (0.31 0.00) (0.45 0.02) (0.5 0.09) (0.58 0.01) (0.83 0.08) (0.9 0.08)

Here Information Gain is maximum against the “0.18” threshold. So, that value would be the required answer.

Program:

import numpy as np

def entropy(s):
    '''
    Calculates the entropy given list of target(binary) variables
    '''
    # Write your code here
    
    # Caclulate entropy
    entropy = 0
    
    
    #Your code ends here
    
    return -entropy
    

def information_gain(parent, left_child, right_child):
    
    '''
    Compute information gain given left_child target variables (list), right_child target variables(list) and their parent targets(list)
    '''
    
    info_gain=None
    # Write your code here
    
    
    #Your code ends here
    return info_gain
    
def best_split(features,labels):
    '''
    inputs:
        features: nd-array
        labels: nd-array
    output:
        float value determining best threshold for decision tree classification
    '''
    
    best_threshold=None
    best_info_gain = -1
    
    # For every unique value of that feature
    for threshold in np.unique(features):
        
        y_left = _____________  #list of labels in left child
        y_right = _____________  #list of labels in right child
        
        if len(y_left) > 0 and len(y_right) > 0:
            gain = ____________                 # Caclulate the information gain and save the split parameters if the current split if better then the previous best

            if gain > best_info_gain:
                best_threshold = threshold
                best_info_gain = gain
    
    return best_threshold

Final Program:

import numpy as np

def entropy(s):
    '''
    Calculates the entropy given list of target(binary) variables
    '''
    # Write your code here
    counts = np.bincount(s)
    probabilities = counts / len(s)
    # Caclulate entropy
    entropy = 0
    for p in probabilities:
        if p > 0:
            entropy -= p * np.log2(p)
    
    #Your code ends here
    
    return entropy
    

def information_gain(parent, left_child, right_child):
    
    '''
    Compute information gain given left_child target variables (list), right_child target variables(list) and their parent targets(list)
    '''
    
    info_gain=None
    # Write your code here
    parent_entropy = entropy(parent)
    left_entropy = entropy(left_child)
    right_entropy = entropy(right_child)
    
    # Weighted average of the child entropies
    left_weight = len(left_child) / len(parent)
    right_weight = len(right_child) / len(parent)
    
    weighted_entropy = left_weight * left_entropy + right_weight * right_entropy
    
    # Information gain is the difference in entropy
    info_gain = parent_entropy - weighted_entropy
    
    #Your code ends here
    return info_gain
    
def best_split(features,labels):
    '''
    inputs:
        features: nd-array
        labels: nd-array
    output:
        float value determining best threshold for decision tree classification
    '''
    
    best_threshold=None
    best_info_gain = -1
    
    # For every unique value of that feature
    for threshold in np.unique(features):
        
        y_left = labels[features <= threshold]  #list of labels in left child
        y_right = labels[features > threshold]  #list of labels in right child
        
        if len(y_left) > 0 and len(y_right) > 0:
            gain = information_gain(labels, y_left, y_right)                 # Caclulate the information gain and save the split parameters if the current split if better then the previous best

            if gain > best_info_gain:
                best_threshold = threshold
                best_info_gain = gain
    
    return best_threshold

Machine Learning - Deep Learning

Machine Learning Programs

Machine Learning Programs

Decision Tree Characteristics

Context:

Question:

Statements to Evaluate:

Decision Tree Classification _ Program

About Machine Learning

SOFTWARE ENGINEERING