Machine Learning - Deep Learning

Training a Binary Classifier

Training a Binary Classifier

· Let’s simplify the problem for now and

ü only try to identify one digit

—for example, the number 5.

· This “5-detector” will be an example of a binary classifier,

ü capable of distinguishing between just two classes,

· 5

· not-5.

ü Let’s create the target vectors for this classification task:

y_train_5 = (y_train == 5)

# True for all 5s, False for all other digits

y_test_5 = (y_test == 5)

· Now let’s pick a classifier and train it.

· A good place to start is with a

ü Stochastic Gradient Descent (SGD) classifier,

· using Scikit-Learn’s SGDClassifier class.

· This classifier has the advantage of being capable of handling very large datasets efficiently.

· This is in part because SGD deals with training instances independently, one at a time.

ü Let’s create an SGDClassifier and train it on the whole training set:

from sklearn.linear_model import SGDClassifier

sgd_clf = SGDClassifier(random_state=42)

sgd_clf.fit(X_train, y_train_5)

ü The SGDClassifier relies on randomness during training (hence the name “stochastic”).

ü If you want reproducible results,

• you should set the random_state parameter.

ü Now we can use it to detect images of the number 5:

sgd_clf.predict([some_digit])

array([ True])

· The classifier guesses that this image represents a 5 (True).

YouTube Link: https://youtu.be/AWI2qUUkPK8

MNIST Dataset Description

MNIST:

MNIST (Modified National Institute of Standards and Technology):

· MNIST dataset,

· which is a set of 70,000 small images of digits handwritten by

ü high school students

ü employees of the US Census Bureau.

Each image is labeled with the digit it represents.

Figure 1. Digits from the MNIST dataset

· This set has been studied so much that it is often called the “hello world” of Machine Learning:

ü whenever people come up with a new classification algorithm,

ü they are curious to see how it will perform on MNIST, and

ü anyone who learns Machine Learning tackles this dataset sooner or later.

ü Scikit-Learn provides many helper functions to download popular datasets.

ü MNIST is one of them.

• The following code fetches the MNIST dataset:

from sklearn.datasets import fetch_openml

MNIST = fetch_openml('MNIST_784', version=1)

MNIST.keys()

• dict_keys(['data', 'target', 'feature_names', 'DESCR', 'details', 'categories', 'url']) including the following:

ü A DESCR key describing the dataset

ü A data key containing an array with one row per instance and one column per feature

ü A target key containing an array with the labels

• Let’s look at these arrays:

X, y = MNIST["data"], MNIST["target"]

X.shape

(70000, 784)

y.shape

(70000,)

• There are

ü 70,000 images, and

ü each image has 784 features.

· This is because each image is 28 × 28 pixels, and

· each feature simply represents one pixel’s intensity,

· from 0 (white) to 255 (black).

· Let’s take a peek at one digit from the dataset.

· All you need to do is grab an instance’s feature vector, reshape it to a 28 × 28 array, and display it using Matplotlib’s imshow() function:

import matplotlib as mpl

import matplotlib.pyplot as plt

some_digit = X[0]

some_digit_image = some_digit.reshape(28, 28)

plt.imshow(some_digit_image, cmap="binary")

plt.axis("off")

plt.show()

· This looks like a 5, and indeed that’s what the label tells us.

y[0]

'5'

· Note that the label is a string.

· Most ML algorithms expect numbers, so let’s cast y to integer:

y = y.astype(np.uint8)

· To give you a feel for the complexity of the classification task,

· Figure 1 shows a few more images from the MNIST dataset.

· You should always

ü create a test set and

ü set it aside before inspecting the data closely.

· The MNIST dataset is actually already split into

ü a training set (the first 60,000 images)

ü a test set (the last 10,000 images):

X_train, X_test, y_train, y_test = X[:60000], X[60000:],

y[:60000], y[60000:]

· The training set is already shuffled for us,

ü which is good because this guarantees that

· all cross-validation folds will be similar.

· Moreover, some learning algorithms are

ü sensitive to the order of the training instances, and

ü they perform poorly if they get many similar instances in a row.

· Shuffling the dataset ensures that this won’t happen.

Youtube Link:

https://youtu.be/GaVUPdyOSyY

Decision Trees are a fundamental machine learning algorithm used for both classification and regression tasks. Understanding their characteristics, capabilities, and limitations is crucial for effectively applying them to solve real-world problems.

Question:

Which of the following statements are true regarding the properties and behavior of Decision Trees?

Statements to Evaluate:

1. Decision tree makes no assumptions about the data.
2. The decision tree model can learn non-linear decision boundaries.
3. Decision trees cannot explain how the target will change if a variable is changed by 1 unit (marginal effect).
4. Hyperparameter tuning is not required in decision trees.
5. In a decision tree, increasing entropy implies increasing purity.
6. In a decision tree, the entropy of a node decreases as we go down the decision tree.

Choose the correct answer from below:

A) 1, 2, and 5

B) 3, 5 and 6

C) 2, 3, 4 and 5

D) 1,2,3 and 6

Ans: D 1, 2, 3 and 6

Decision Tree Classification _ Program

Q. Decision Tree Classification

Problem Description

As you know, Decision Tree is all about splitting nodes at different levels and trying to classify accurately as much as possible.

You are given a feature (1-d array) and label (1-d array) (target) where you have to determine which value in the corresponding feature is best to split upon at the first root level for building a decision tree.

The feature would be having continuous values whereas the target is binary in nature. So, The main task is to determine which value/threshold is best to split upon considering the classification task taking the loss as entropy and maximizing Information Gain.

Input Format

Two inputs:

1. 1-d array of feature

2. 1-d array of label

Output Format

Return threshold value

Example Input

feature: [0.58 0.9 0.45 0.18 0.5 0.12 0.31 0.09 0.24 0.83]

label: [1 0 0 0 0 0 1 0 1 1]

Example Output

0.18

Example Explanation

If you calculate Information Gain for all of the feature values, it would be computed as : (threshold, Information Gain)

(0.09 0.08) (0.12 0.17) (0.18 0.28) (0.24 0.05) (0.31 0.00) (0.45 0.02) (0.5 0.09) (0.58 0.01) (0.83 0.08) (0.9 0.08)

Here Information Gain is maximum against the “0.18” threshold. So, that value would be the required answer.

Program:

import numpy as np

def entropy(s):
    '''
    Calculates the entropy given list of target(binary) variables
    '''
    # Write your code here
    
    # Caclulate entropy
    entropy = 0
    
    
    #Your code ends here
    
    return -entropy
    

def information_gain(parent, left_child, right_child):
    
    '''
    Compute information gain given left_child target variables (list), right_child target variables(list) and their parent targets(list)
    '''
    
    info_gain=None
    # Write your code here
    
    
    #Your code ends here
    return info_gain
    
def best_split(features,labels):
    '''
    inputs:
        features: nd-array
        labels: nd-array
    output:
        float value determining best threshold for decision tree classification
    '''
    
    best_threshold=None
    best_info_gain = -1
    
    # For every unique value of that feature
    for threshold in np.unique(features):
        
        y_left = _____________  #list of labels in left child
        y_right = _____________  #list of labels in right child
        
        if len(y_left) > 0 and len(y_right) > 0:
            gain = ____________                 # Caclulate the information gain and save the split parameters if the current split if better then the previous best

            if gain > best_info_gain:
                best_threshold = threshold
                best_info_gain = gain
    
    return best_threshold

Final Program:

import numpy as np

def entropy(s):
    '''
    Calculates the entropy given list of target(binary) variables
    '''
    # Write your code here
    counts = np.bincount(s)
    probabilities = counts / len(s)
    # Caclulate entropy
    entropy = 0
    for p in probabilities:
        if p > 0:
            entropy -= p * np.log2(p)
    
    #Your code ends here
    
    return entropy
    

def information_gain(parent, left_child, right_child):
    
    '''
    Compute information gain given left_child target variables (list), right_child target variables(list) and their parent targets(list)
    '''
    
    info_gain=None
    # Write your code here
    parent_entropy = entropy(parent)
    left_entropy = entropy(left_child)
    right_entropy = entropy(right_child)
    
    # Weighted average of the child entropies
    left_weight = len(left_child) / len(parent)
    right_weight = len(right_child) / len(parent)
    
    weighted_entropy = left_weight * left_entropy + right_weight * right_entropy
    
    # Information gain is the difference in entropy
    info_gain = parent_entropy - weighted_entropy
    
    #Your code ends here
    return info_gain
    
def best_split(features,labels):
    '''
    inputs:
        features: nd-array
        labels: nd-array
    output:
        float value determining best threshold for decision tree classification
    '''
    
    best_threshold=None
    best_info_gain = -1
    
    # For every unique value of that feature
    for threshold in np.unique(features):
        
        y_left = labels[features <= threshold]  #list of labels in left child
        y_right = labels[features > threshold]  #list of labels in right child
        
        if len(y_left) > 0 and len(y_right) > 0:
            gain = information_gain(labels, y_left, y_right)                 # Caclulate the information gain and save the split parameters if the current split if better then the previous best

            if gain > best_info_gain:
                best_threshold = threshold
                best_info_gain = gain
    
    return best_threshold

Machine Learning - Deep Learning

Training a Binary Classifier

MNIST Dataset Description

Decision Tree Characteristics

Context:

Question:

Statements to Evaluate:

Decision Tree Classification _ Program

About Machine Learning

SOFTWARE ENGINEERING