Machine Learning - Deep Learning

Neural Network 5

Q1. Backpropagation in MLP

Which of the following options are true with respect to Backpropagation?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. In backpropagation, we calculate the error contribution of each neuron.

B. In backpropagation, we calculate the loss gradients with respect to inputs.

C. In backpropagation, we calculate the loss gradient with respect to weights and biases.

D. In backpropagation, we update the weights of neurons in each iteration.

Ans: A, C,D

Correct options :

i) In backpropagation, we calculate the error contribution of each neuron.

ii) In backpropagation, we calculate the loss gradient with respect to weights and biases.

iii) In backpropagation, we update the weights of neurons in each iteration.

Explanation :

Only this statement is false “It is used to calculate the loss gradients with respect to inputs.” as we can’t update the inputs. All the other options are true.

Backpropagation calculates the rates with which loss changes with respect to weights and biases and then weights and biases are updated inorder to minimize the loss function.

Q2. Complete the updating code

We want to use the above code snippet for a simple binary classification task where if the model() returns 1 for an observation, then the observation will be classified as '+'(1) otherwise '-'(0).

model() will return 1 only if the weighted sum is greater than or equal to the threshold thresh. In the above code snippet, the fit function will be used for getting a weight matrix for the classification task.

Complete the updating syntax for weights [?] and threshold [??].

Note: The inputs are always positive

Choose the correct answer from below:

A. w = w + lr * x, thresh = thresh + lr

B. w = w - lr * x, thresh = thresh + lr

C. w = w + lr * x, thresh = thresh - lr * x.

D. w = w - lr * x, thresh = thresh - lr * x.

Ans: A

Correct Answer:

w = w + lr * x, thresh = thresh + lr

Explanation

The code snippet is basically a simple version of the implementation of perceptron where :

we are updating the weights and threshold with the same intuition as in SGD but without formulation.

If the expected output is ”+” and the predicted one is ”-“, then :

we should increase the weights in order to increase the weighted sum i.e. w.x
and decrease the threshold (thresh)

If the expected output is ”-“ and the predicted one is ”+”, then :

we should decrease the weights in order to decrease the weighted sum
and increase the threshold (thresh)

The code for model() is as follows:

def model(x,w,thresh):

return 1 if (np.dot(w, x) >= thresh) else 0

Q3. Weight's value

Consider a neural network as shown in the image below:

The initial values of x1,x2 and x3 are [10,5,5]. The true value of output is 4. If the loss function is mean squared error then what is the value of w1 after the first epoch?

Consider initial value of all w1, w2, w3, w4 and w5 as 0.1 and the learning rate is 0.01

Choose the correct answer from below:

A. 0.550

B. 0.252

C. 0.111

D. 0.340

Ans: B

Correct option : 0.252

Explanation :

Let o1 be the output coming out of the first neuron in the hidden layer and o2 be the output coming out of the secon neuron in the hidden layer

Now, o1=F(x2), where x = w1.x1
and o2 = F(x), where x = w2.x2+w3.x3

o1=w12.x12=(0.1).(0.1).(10).(10)=1
o2=w2.x2+w3.x3=(0.1).(5)+(0.1).(5)=1

Similalry,
y^=w4.o1+w5.o2=w4.(w12.x12)+w5(w2.x2+w3.x3)

According to the question, the loss function is :
loss=(y−y^)2

Using the chain rule of differentiation :

dw1d(loss)=do1d(loss).dw1d(o1)

Thus,

do1d(loss)=do1d(y−w4.o1−w5.o2)2=(−2).(w4).(y−w4.o1−w5.o2)

=(−2).(0.1).(4−(0.1).(1)−(0.1).(1))=(−0.2).(4−0.2)=(−0.2).(3.8)=−0.76

Similarly,

dw1d(o1)=dw1d(w12.x12)=2.w1.x1.x1=(2).(0.1).(10).(10)=20

Finally,

dw1d(loss)=(−0.76).(20)=−15.2

Updating the weight :

w1←w1−α.dw1d(loss)

where α is the learning rate

=0.1−0.01.(−15.2)

=0.1+0.152=0.252

Q4. Convergence

Fill in the blank :

In a multi-layered perceptron architecture, gradient descent ______ .

Choose the correct answer from below:

A. always converges to the global minimum.

B. doesn't converge to the global minimum.

C. may or may not converge to the global minimum.

D. will always converge to the global minimum if the learning rate is appropriate.

Ans: C

Correct option : may or may not converge to the global minimum

Explanation :

Gradient descent may or may not converge to a global minimum depending on the initial weights and learning rate. The loss function for a multi-layered perceptron is neither convex nor concave due to which it can have multiple local minima. So, it is not guaranteed that the gradient descent will converge.

Q5. Calculate the loss

Given the dataset, calculate the loss after completing the code snippet. Blanks are [?] .

def hypothesis(w,b,x):                           #Section 1
return 1.0/(1.0 + np.exp(-(w*x + b)))

def error(w,b):                                  #Section 2
err=0.0
for x,y in zip(train,label):
    fx = hypothesis(w,b,x)
    err += 0.5 * (fx-y) ** 2
return err

def grad_w(w,b,x,y):                              #Section 3
fx=hypothesis(w,b,x)
return (fx-y)*fx*(1-fx)*x

def grad_b(w,b,x,y):                              #Section 4
fx=hypothesis(w,b,x)
return (fx-y)*fx*(1-fx)

def gradient_descent(train,label,w,b,lr,max_epochs):    #Section 5
dw=0
db=0
for i in range(max_epochs):
    for x,y in zip(train,label):
      dw+=grad_w(w,b,x,y)
      db+=grad_b(w,b,x,y)
    w = w [?] lr*dw         # [?] is here arithmetic sign
    b = b [?] lr*db         # [?] is here arithmetic sign
    print("For Epoch {}, the loss is {}".format(i+1, error(w,b)))
return w,b

df=pd.read_csv("filepath")
train=df['X']
label=df['Y']
initial_w = 1
initial_b = 1
lr=0.01
max_epochs=50
w,b = gradient_descent(train,label,initial_w,initial_b,lr,max_epochs)

Choose the correct option for the loss and arithmetic signs.

df=pd.read_csv(data)

Choose the correct answer from below:

A. 0.016, -, -

B. 0.28, +, +

C. 0.028, -, -

D. 0.050, +, +

Ans: A

Correct option : 0.016, -, -

Explanation :

The rule of parameter updation in gradient descent is :

w←w−α.(∂L/∂w)
b←b−α.(∂L/∂b),
where α is the learning rate

Thus the signs in place of [?] will be -,-

The loss after 50 epochs comes out to be around 0.016

Q6. Fully connected neural network

Which, if any, of the given propositions is true about fully-connected neural networks (FCNN)?

Choose the correct answer from below:

A. In a FCNN, there are connections between neurons of a same layer.

B. In a FCNN, the most common weight initialization scheme is the zero initialization, because it leads to faster and more robust training.

C. The neurons of one layer are connected to every neuron of its preceding layer.

D. None of the options

Ans: C

Correct option : The neurons of one layer are connected to every neuron of its preceding layer.

Explanation :

In a FCNN, Neurons of one layer are connected to every neuron of its preceding layer, But there are no connections between neurons of the same layer.
Zero initialization leads to weight symmetry and undermines training. ( If all the weights are the same, then they will all receive the same update in each training round, so no learning can occur )

Q7. Compare the learnings

The following graph shows the learning speeds versus the number of epochs for the four hidden layers where Hidden layer 1 and Hidden layer 4 are the first and last hidden layers respectively.

Considering the graph mark the correct option.

Choose the correct answer from below:

A. The initial layers learn slower since the weights of the initial layers are always higher

B. The gradients for initial layers are smaller than those of later ones, causing slow learning in them

C. The slow learning in initial layers is due to the dead activation of sigmoid in the initial layers

D. The slow learning in initial layers is due to faster learning at initial stages

Ans: B

Correct option: The gradients for initial layers are smaller than those of later ones, causing slow updating of weights and biases in them hence slow learning.

Explanation:

Recall that the backpropagation algorithm involves heavy usage of the chain rule, which means that the gradient received by the initial layers is represented as a long series of multiplications. During backpropagation, If the gradients at the final layers are fairly small, the initial layers will get a very tiny gradient. Hence, weights will get updated very slowly in these layers.

One option suggests that the initial layers don't require much updating which is wrong as the weights are initialized randomly and they do require significant updating. Therefore it can't be true that the earlier layers require lesser updating.

One option suggests that the initial layers learn slower later because they learn faster in the initial epochs which is not true.

One option says it's due to dead activation in the initial layers which is not true, there isn't any such concept.

TensorFlow & keras 3

Q1. Functional model

Complete the code snippet in order to get the following model summary.

from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.models import Model

def create_model_functional():
inp = Input(shape=(28, ))
h1 = Dense(64, activation="relu", name="hidden_1")(inp)
h2 = Dense(_a_ , activation="relu", name="hidden_2")(h1)
out = Dense(4, activation="softmax", name="output")(_b_)
model = Model(inputs=inp, outputs=out, name="simple_nn")

return model

model_functional = create_model_functional()
model_functional.summary()

Choose the correct answer from below:

A. 512, b - h2

B. 64, b - h2

C. 10, b - h1

D. 512, b – inp

Ans: A

Correct Option: a- 512, b - h2

Explanation:

To get the model summary as shown in the question, the value of a should be 512 and the value of b should be h2. This will create a neural network model with 2 hidden layers, the first hidden layer with 64 neurons and the second hidden layer with 512 neurons.

Here's an explanation of the code:

The 'create_model_functional' function creates a functional neural network model using the Keras API from TensorFlow.
The model has an input layer with shape (28,), which means it expects input data with 28 features. The first hidden layer has 64 neurons and uses the ReLU activation function.
The second hidden layer has 'a' neurons and uses the ReLU activation function. In this case, we want 'a' to be 512, so that the second hidden layer has 512 neurons.
The output layer has 4 neurons and uses the softmax activation function, which is suitable for multiclass classification problems.
The 'b' placeholder is used to connect the output of the second hidden layer to the input of the output layer. In this case, we want to connect it to 'h2', which is the output of the second hidden layer.

Q2. Customized loss function

For a certain sequential regression model predicting two outputs, we implemented a loss function that penalizes the prediction error for the second output(y2) more than the first one(y1) because y2 is more important and we want it to be really close to the target value.

import numpy as np
def custom_mse(y_true, y_pred):
loss = np.square(y_pred - y_true)
loss = loss * [0.5, 0.5] #x
loss = np.sum(loss, axis=0) #y
return loss
model.compile(loss=custom_mse, optimizer='adam')

Which of the following option is correct with respect to the above implementation of a custom-made loss function?

Note: The shape of y_pred is (batch_size, 2) in the implementation.

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Custom_mse function's output should have a shape (batch_size, 2)

B. Custom_mse function's output should have a shape (batch_size, )

C. The axis for the sum of loss in line x should be 1

D. The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement

Ans: B,C,D

Correct options :

Custom_mse function's output should have a shape (batch_size, )

The axis for the sum of loss in line x should be 1

The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement

Explanation :

Custom_mse function's output should have a shape (batch_size, ): The first dimension of arguments y_true and y_pred is always the same as batch size. The loss function should always return a vector of length batch_size.

The axis for the sum of loss in line x should be 1: Here we need the loss values for each observation's two outputs to be summed up therefore axis=1 should be used.

The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement: Because we want to penalize the error for y2 more therefore we can use any of the values where the weight for y2 is more eg. [0.3, 0.7].

TensorFlow and Keras-2

Q1. Sigmoid and softmax functions

Which of the following statements is true for a neural network having more than one output neuron ?

Choose the correct answer from below:

A. In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is always 1.

B. In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is 1 if and only if we have just two output neurons.

C. In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

D. The softmax function is a special case of the sigmoid function

Ans: C

Correct option : In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

Explanation :

For the sigmoid activation, when we have more than one neuron, it is possible to have the sum of outputs from the neurons to have any value.

The softmax classifier outputs the probability distribution for each class, and the sum of the probabilities is always 1.

The Sigmoid function is the special case of the Softmax function where the number of classes is 2.

Q2. Type of loss

We want to classify credit card transactions as fraudulent or normal, which loss type is appropriate for this use case?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Categorical crossentropy

B. Binary crossentropy

C. Adam

D. SGD

Ans: A, B

Correct Option:

Categorical crossentropy
Binary crossentropy

Explanation:
If you have one neuron at the end in the classification NN, then you need to use sigmoid. In that case, it will be binary cross-entropy. If you take 2 neurons at the end, then you have to one-hot encode the target and then you need to use softmax with CCE.

Q3. Callbacks in tensorflow

Which method gets called after each epoch in tensorflow callback?

Choose the correct answer from below:

A. on_epoch_end

B. on_epoch_finished

C. on_end

D. on_training_complete

Ans: A

Correct Option: on_epoch_end

Explanation:

tensorflow callback method on_epoch_end contain functionalities that can be called at the end of each epoch.
tf.keras.callbacks.Callback can be inherited by custom classes in which methods like on_train_begin, on_epoch_begin can be

Q4. Avoid overfitting

Jack was asked to create a classifier for a two-class non-linearly separable dataset consisting of 100 observations. He did not know the complexity of the non-linearity of separation therefore he created a model with 500 nodes in the only hidden layer with ReLU activation and used sigmoid in the output layer.

Jack was aware that his model can overfit the data so he implemented a function that can stop the training as soon as the model starts overfitting.

from keras.callbacks import EarlyStopping
es = EarlyStopping(monitor = 'val_loss',
min_delta = 0,
patience = 3,
restore_best_weights = True)

Now Ryan also wanted to implement such a function and he made the observations given in the options. Which of Ryan's observation(s) are incorrect?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. The training process will be monitored according to the validation loss.

B. The training process will stop as soon as the difference between validation loss of two consecutive epochs is greater than 0.

C. The training process will be stopped if there are more than 3 epochs having val_loss smaller than the latest minimum val_loss value.

D. The best model weights according to val_loss, will be saved after training.

Ans: B, C

Correct options :

The training process will stop if the difference between validation loss of two consecutive epochs is greater than 0.

The training process will be stopped if there are more than 3 epochs having val_loss smaller than the latest minimum val_loss value.

Explanation :

Actually min_delta is set to 0 implying that if the val_loss decreases by any value greater than 0 it will be counted as an improvement.

Actually patience is set to 3 implying that the training will be stopped if there are more than three consecutive epochs with no improvement according to min_delta. (i.e. all val_loss were increasing) Two epochs are said to be improving if the monitored value improves (i.e. here val_loss decreases by a value greater than equal to min_delta).

Monitor: The 'monitor' argument takes the value or the metric based on which the training is evaluated.

min_delta: The 'min_delta' argument takes the value which represents the absolute minimum difference between the monitored value in two consecutive epochs for which the training is stopped, that is the minimum change required to qualify as an improvement.

patience: The 'patience' argument takes the maximum number of epochs for which no improvement was made.

restore_best_weights: The 'restore_best_weights' parameter is set to true if we want to save the best model during training process according to the monitored metric.

Q5. Adding callbacks

We are trying to train a model on a training dataset for 20 epochs.

model.fit(x_train, y_train, epochs=20,callbacks = callback)

Add callbacks to the above model based on the conditions given below:

Cond1. If the validation accuracy at an epoch is less than the previous epoch's accuracy, we have to decrease the learning rate by 10%.

The options for Cond1 are:

a. reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9,
                              patience=1)
   callback=[reduce_lr]

b. reduce_lr = ReduceLROnPlateau(monitor='val_acc', factor=0.9,
                              patience=0)
   callback=[reduce_lr]

Cond2. For every 3rd epoch, decay the learning rate by 5%.

The options for Cond2 are:

c. def step_decay(epoch):
      initial_lrate = 0.1
      drop = 0.95
      epochs_drop = 3
      lrate = initial_lrate * math.pow(drop,math.floor((epoch)/epochs_drop))
      return lrate

   lrate = LearningRateScheduler(step_decay)
   callback = [lrate]

d. initial_learning_rate = 0.1

   lr_schedule = tf.keras.optimizers.schedules.ExponentialDecay(
   initial_learning_rate,
   decay_steps = 3,
   decay_rate = 0.95,
   staircase = True)

   model.compile(optimizer = tf.keras.optimizers.SGD(learning_rate = lr_schedule),
              loss ='sparse_categorical_crossentropy',
              metrics = ['accuracy'])

Which of the above options will be correct for requirements in Cond1 and Cond2?

Choose the correct answer from below:

A. a, c and d

B. b, c and d

C. a, b and c

D. a, b and d

Ans: B

Correct option : b, c and d

Explanation :

If we set the patience = 1, the model will wait once more to get a lower accuracy again before decreasing the learning rate. Therefore setting the patience = 0, will decrease the learning rate as soon as the accuracy drops.
If patience=1, would have been applied to metric `loss` then the model would have waited one more time to get a higher loss than the minimum encountered loss before decreasing the learning rate.

Both c and d can be used for updating the learning rate using optimizer and callbacks respectively.

TensorFlow and Keras -1

Q1. Binary classification

In order to perform binary classification on a dataset (class 0 and 1) using a neural network, which of the options is correct regarding the outcomes of code snippets a and b? Here the labels of observation are in the form : [0, 0, 1...].

Common model:

import tensorflow
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
opt = SGD(learning_rate=0.01)

Code snippet a:

model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

Code snippet b:

mode.add(Dense(1, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

The term "Required results" in the options means that the accuracy of the model should be above 60%.

Note: 40% of the dataset is from class 0.

Choose the correct answer from below:

A. Both a and b will give required results.

B. Only b will give the required results.

C. Only a will give the required results.

D. Both a and b will fail to give required results.

Ans: C

Correct option: only a will give the required results.

Explanation :

The task requires that the output layer is configured with a single node and a ‘sigmoid‘ activation function in order to predict the probability for the required class. For applying the softmax function for binary classification, the output layer should have 2 neurons for predicting the probability of the two classes individually.

In order to get the required results using the softmax function we need to have 2 neurons in the output layer and also the labels should be in one-hot encoded format.

Q2. Sequential classification model

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD

model = Sequential()
model.add(Dense(64, activation = 'y', input_dim=50))
model.add(Dense(64, activation = 'y'))
model.add(Dense(x, activation = 'z'))

model.compile(loss ='categorical_crossentropy',
optimizer = SGD(lr = 0.01),
metrics = ['accuracy'])

model.fit(X_train, y_train,
epochs=20)

Ram wants to create a model for the classification of types of malware in 10 different categories. He asked for help from Shyam, and he helped him with the incomplete code as shown above in the snippet. Help Ram in completing the code for classification if the data used has 50 input features. Choose the best-suited option for filling out x, y, and z.

Choose the correct answer from below:

A. x = len(np.unique(y_train)), y = softmax, z = softmax

B. x = 2 * len(np.unique(y_train)), y = relu, z = relu

C. x = len(np.unique(y_train)), y = relu, z = softmax

D. x = 0.5 * len(np.unique(y_train)), y = relu, z = relu

Ans: C

Correct option :

x = len(np.unique(y_train))
y = relu
z = softmax

Explanation :

z : For multiclass classification, softmax activation is used.
x : For the softmax activation, the output layer has the same number of neurons as the number of different classes.

y : ReLu activation function can definitely be used in the intermediate layers. ReLU is not used in the output layer of classification. Because of it's unbounded range, it's difficult to determine thresholds. Though ReLu can be used in regression tasks where negative values don't make sense like predicting prices.

Q3. Multi target output

For a multi-output regression model:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def get_model(n_inputs):
model = keras.Sequential()
model.add(Dense(20, input_dim = n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(______)
model.compile(loss = 'mae', optimizer = 'adam')
return model

We want to build a neural network for a multi-output regression problem. For each observation, we have 2 outputs. Complete the code snippet to get the desired output.

Choose the correct answer from below:

A. Dense(2)

B. Dense(3)

C. activation('sigmoid')

D. activation('relu')

Ans: A

Correct option: Dense(2).

Explanation:
As we have 2 outputs therefore our output layer of model should have 2 neurons.

Q4. Number of parameters

Consider the following neural network model :

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

The number of parameters in this model is:

Choose the correct answer from below:

A. 120

B. 96

C. 108

D. 121

Ans: D

Correct option : 121

Explanation :

Number of nodes in the input layer(i) = 8
Number of nodes in the hidden layer(h) = 12
Number of nodes in the output layer(o) = 1
So,
Number of parameters = (8×12+12×1)+12+1 = 121

Q5. Model summary

Complete the following code snippet in order to get a model with the attached model summary.

import tensorflow as tf
model = tf.keras.models.Sequential()

# Create model
model.add(tf.keras.layers.Input(shape=(_a_, )))
model.add(tf.keras.layers._b_( 512 , activation='relu'))
model.add(tf.keras.layers.Dense( _c_, activation='softmax'))

model.summary()

Choose the correct answer from below:

A. a - 32, b - Dense, c - 10

B. a - 12, b - Dense, c - 10

C. a - 10, b - Dense, c - 5

D. a - Dense(33), b - Dense, c – 50

Ans: Correct Option:
a - 32, b - Dense, c - 10

Explanation:

The key for getting a is that in the first layer we will have the number of parameters equal to (no. of features in input * neurons in the first layer) + neurons in the first layer, i.e. 32 x 512 + 512 = 16896
As from the first layer, we got the info from summary as dense. Similarly, for the second layer (i.e. c), we can get the number of neurons from output shape from dense_1.

Q6. Logistic regression model

Which of these neural networks would be most appropriately representing a logistic regression model structure for binary classification?

model = Sequential()
model.add(Dense(units=32 input_shape=(2,), activation = ‘relu’))
model.add(Dense(units=64, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=16))
model.add(Dense(units=32, activation=’relu’))
model.add(Dense(units=64,activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Choose the correct answer from below:

A. a

B. b

C. c

D. d

Ans: B

Correct Option: b

Explanation:

Option B would be most appropriate for representing a logistic regression model structure for binary classification. This is because it has a single input layer with only one neuron and a sigmoid activation function. The sigmoid function maps the output to a probability value between 0 and 1, which is ideal for binary classification problems.
Option A has two layers, with the second layer using the sigmoid activation function. While this could work for binary classification, the use of the ReLU activation function in the first layer is more commonly used in multi-class classification problems.
Option C has two sigmoid layers, which would be more appropriate for a deeper neural network structure for more complex problems.
Option D has a similar structure to Option A, with an additional hidden layer. While this could also work for binary classification, the use of ReLU activation in the second layer may make it more suitable for multi-class classification problems.

Q7. Model hyperparameters

Complete the following model to get the training output attached to the image.

model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=[‘_a_’])

# train model
model.fit(x=X_train,
          y=y_train,
          epochs = _b_ ,
          validation_data=(X_test, y_test))

Choose the correct answer from below:

A. a - loss, b - 5

B. a - accuracy, b - 100

C. a - loss, b - 25

D. a - val_acc, b - 100

Ans: B

Correct option:
a - accuracy, b - 100

Explanation:
As the image shows the accuracy, therefore metrics has to be accuracy.
Also in the image, the no. of epochs is showing 100.

Q8. Model prediction

We want to use our trained binary classification (trained with binary cross entropy and sigmoid activation function) model 'model', in order to get the label for the first observation in our test dataset of shape (m x n).

Mark the correct option which has the code to meet our requirements.

Note: m represents the number of observations and n represents the number of independent variables.

Choose the correct answer from below:

A. model.predict(test_data[0])

B. 1 if model.predict(test_data[0].reshape(1,-1)) < 0.5 else 0

C. model.predict(test_data[0].reshape(1,-1))

D. 1 if model.predict(test_data[0].reshape(1,-1)) > 0.5 else 0

Ans: D

Correct Answer: 1 if model.predict(test_data[0].reshape(1,-1)) > 0.5 else 0

Explanation:

As the model is trained with sigmoid activation function it’ll give the output with probability between 0 and 1, therefore we need to use the ternary operator.
Also we need to reshape the test_data[0] otherwise the api will throw an error mentioning reshaping the data if it has single sample.

Machine Learning - Deep Learning

Neural Network 5

TensorFlow & keras 3

TensorFlow and Keras-2

TensorFlow and Keras -1

About Machine Learning

SOFTWARE ENGINEERING