Machine Learning - Deep Learning: ReLU

Showing posts with label ReLU. Show all posts

Neural network 4

Q1. Tanh and Leaky ReLu

Which of the following statements with respect to Leaky ReLu and Tanh are true?

a. When the derivative becomes zero in the case of negative values in ReLu, no learning happens which is rectified in Leaky ReLu.

b. Tanh is a zero-centered activation function.

c. Tanh produces normalized inputs for the next layer which makes training easier.

d. Tanh also has the vanishing gradient problem.

Choose the correct answer from below:

A.     All the mentioned statements are true.
B.     All the mentioned statements are true except c.
C.      All the mentioned statements are true except b.
D.     All the mentioned statements are true except d.
Ans: A

Correct options: All the mentioned statements are true.

Explanation :

1) The problem of no learning in the case of ReLu is called dying ReLu which Leaky ReLu takes care of.

2) Yes, tanh is a zero-centered activation function.

3) As the Tanh is symmetric and the mean is around zero it produces normalized inputs( between -1 and 1 ) to the next layer which makes the training easier.

4) As Tanh is also a sigmoidal function it also faces the vanishing gradient problem.

Q2. Dog and cat classifier

You are building a binary classifier for recognizing dogs (y=1) vs. cats (y=0). Which one of these is the best activation function for the output layer?

Choose the correct answer from below:

A.     ReLU
B.     Leaky ReLU
C.      sigmoid
D.     Tanh

Ans: C

Correct option : sigmoid
Explanation : Sigmoid function outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. We can also change this threshold value.
It can be done with tanh as well but it is less convenient as the output is between -1 and 1.

Q3. Maximum value of derivates

This shows two columns one showing activation functions and the other showing the maximum value of the first-order derivatives. Map the function to the correct value on the right that shows the maximum value of their derivative.

Choose the correct answer from below:

A.     1-d, 2-c, 3-b, 4-a
B.     1-b, 2-c, 3-d, 4-a
C.      1-c, 2-b, 3-d, 4-a
D.     1-b, 2-d, 3-d, 4-d

Ans: D

Correct option : 1-b, 2-d, 3-d, 4-d.

Explanation :

The derivative of the sigmoid function is sigmoid(x)(1−sigmoid(x)), the maximum value of this is 0.25 at sigmoid(x)=0.5 and x=0.

The derivative of tanh is 1−tanh2, whose maximum is at tanh(x)=0 and x=0, which is 1.

The derivative for all positive values in ReLu is 1 and 0 for all negative values of x.

The derivative for all positive values in LeakyReLu is 1. In case of negative values, let’s say LeakyReLu outputs 0.5*(input), therefore the slope will be 0.5, and hence the derivative will also be 0.5.

For both ReLU and leaky ReLU, the maximum derivative value is 1.

Q4. Leaky relu advantages

What are the advantages of using Leaky Rectified Linear Units (Leaky ReLU) over normal ReLU in deep learning?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     It fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts.
B.     Leaky ReLU always slows down training.
C.      It increases the “dying ReLU” problem, as it doesn’t have zero-slope parts.
D.     Leaky ReLU help the gradients flow easier through the architecture.

Ans: A, D

Correct options:

It fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts
Leaky ReLU helps the gradients flow easier through the architecture.

Explanation:

Leaky ReLU is a variant of the ReLU activation function, which is commonly used in deep learning. The key advantage of using Leaky ReLU over normal ReLU is that it can avoid the "dying ReLU" problem, which occurs when a large number of neurons in a network become inactive and stop responding to inputs.

As for the impact on training, it depends on the context and specific problem you are trying to solve. In some cases, it has been observed that using Leaky ReLU can speed up the training process by preventing the dying ReLU problem, this could be useful when you have sparse data, or the dataset is highly imbalanced. On the other hand, in other cases, it may slow down the training process by introducing more complex non-linearity to the model which results in more difficult optimization process.

This can happen when the input to a neuron is negative and the ReLU activation function is used, since ReLU sets negative inputs to zero. In contrast, Leaky ReLU allows a small, non-zero gradient for negative input values, which can help prevent neurons from becoming inactive and can improve the overall performance of the network, and make the gradients flow easier through the architecture.

Additionally, Leaky ReLU has been shown to outperform other variants of ReLU on some benchmarks, so it may be a better choice in some cases.

Q5. No Activation Function

What if we do not use any activation function(s) between the hidden layers in a neural network?

Choose the correct answer from below:

A.     It will still capture non-linear relationships.
B.     It will just be a simple linear equation.
C.      It will not affect.
D.     Can't be determined.

Ans: B

Correct option : It will just be a simple linear equation.

Explanation :

The main aim of this question is to understand why we need activation functions in a neural network.

Following are the steps performed in a nerual network:

Step 1: Calculate the sum of all the inputs (X) according to their weights and include the bias term:
Z=(weights∗X)+bias

Step 2: Apply an activation function to calculate the expected output:
Y=Activation(Z)

Steps 1 and 2 are performed at each layer. This is a forward propagation.

Now, what if there is no activation function?
Our equation for Y becomes:
Y=Z=(weights∗X)+bias

This is just a simple linear equation. A linear equation will not be able to capture the complex patterns in the data.
To capture non-linear relationships, we use activation functions.

Q6. Trainable parameters

What is the number of trainable parameters in the neural network given below:

Note: The network is not fully connected and the trainable parameters include biases as well.

Choose the correct answer from below:

A.     17
B.     15
C.      10
D.     20

Ans:B

Correct option : 15

Explanation :

The network is not fully connected, and hence the weight terms can be seen by the connections between neurons which are 10 in total.
For biases, we have 4 for neurons in the hidden layer and 1 for the neuron in the output layer which in total gives us 15.

Note : The network shown in the image is purely for teaching purposes. We won’t encounter any neural networks like these in the real life.

Q7. Number of connections

The number of nodes in the input layer of a fully connected neural network is 10 and the hidden layer is 7. The maximum number of connections from the input layer to the hidden layer are :

Choose the correct answer from below:

A.     70
B.     less than 70
C.      more than 70
D.     It is an arbitrary value

Ans: A

Correct option : 70.

Explanation :

Since MLP is a fully connected directed graph, the maximum number of connections is product of the number of nodes in the input layer and hidden layer.
The total number of connections = 10.7 = 70

Q8. How many parameters?

For a neural network consisting of an input layer, 2 hidden layers, and one output layer, what will be the number of parameters if each layer is dense and has a bias associated with it?

Choose the correct answer from below:

A.     24
B.     44
C.      51
D.     32

Ans: B

Correct option : 44

Explanation :

The no. of parameters in a fully connected neural network is given by : (i×h + h×o) + (h+o)
where i = number of neurons in the input layer
h = number of neurons in the hidden layer
o = number of neurons in the output layer

For the input layer each of the 5 inputs are connected to each of the 3 units of the first hidden layer, therefore there will be 5 x 3+ 3(for bias of each unit in the hidden layer) parameters for the first layer.

For the second hidden layer, 3 x 4 + 4 = 16
For the output layer, 4 x 2 + 2 = 10
Therefore total = 18 + 16 + 10 = 44.

TensorFlow & keras 3

Q1. Functional model

Complete the code snippet in order to get the following model summary.

from tensorflow.keras.layers import Dense, Flatten, Input
from tensorflow.keras.models import Model

def create_model_functional():
inp = Input(shape=(28, ))
h1 = Dense(64, activation="relu", name="hidden_1")(inp)
h2 = Dense(_a_ , activation="relu", name="hidden_2")(h1)
out = Dense(4, activation="softmax", name="output")(_b_)
model = Model(inputs=inp, outputs=out, name="simple_nn")

return model

model_functional = create_model_functional()
model_functional.summary()

Choose the correct answer from below:

A. 512, b - h2

B. 64, b - h2

C. 10, b - h1

D. 512, b – inp

Ans: A

Correct Option: a- 512, b - h2

Explanation:

To get the model summary as shown in the question, the value of a should be 512 and the value of b should be h2. This will create a neural network model with 2 hidden layers, the first hidden layer with 64 neurons and the second hidden layer with 512 neurons.

Here's an explanation of the code:

The 'create_model_functional' function creates a functional neural network model using the Keras API from TensorFlow.
The model has an input layer with shape (28,), which means it expects input data with 28 features. The first hidden layer has 64 neurons and uses the ReLU activation function.
The second hidden layer has 'a' neurons and uses the ReLU activation function. In this case, we want 'a' to be 512, so that the second hidden layer has 512 neurons.
The output layer has 4 neurons and uses the softmax activation function, which is suitable for multiclass classification problems.
The 'b' placeholder is used to connect the output of the second hidden layer to the input of the output layer. In this case, we want to connect it to 'h2', which is the output of the second hidden layer.

Q2. Customized loss function

For a certain sequential regression model predicting two outputs, we implemented a loss function that penalizes the prediction error for the second output(y2) more than the first one(y1) because y2 is more important and we want it to be really close to the target value.

import numpy as np
def custom_mse(y_true, y_pred):
loss = np.square(y_pred - y_true)
loss = loss * [0.5, 0.5] #x
loss = np.sum(loss, axis=0) #y
return loss
model.compile(loss=custom_mse, optimizer='adam')

Which of the following option is correct with respect to the above implementation of a custom-made loss function?

Note: The shape of y_pred is (batch_size, 2) in the implementation.

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Custom_mse function's output should have a shape (batch_size, 2)

B. Custom_mse function's output should have a shape (batch_size, )

C. The axis for the sum of loss in line x should be 1

D. The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement

Ans: B,C,D

Correct options :

Custom_mse function's output should have a shape (batch_size, )

The axis for the sum of loss in line x should be 1

The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement

Explanation :

Custom_mse function's output should have a shape (batch_size, ): The first dimension of arguments y_true and y_pred is always the same as batch size. The loss function should always return a vector of length batch_size.

The axis for the sum of loss in line x should be 1: Here we need the loss values for each observation's two outputs to be summed up therefore axis=1 should be used.

The multiplication of [0.5, 0.5] in line y won't be helpful for our requirement: Because we want to penalize the error for y2 more therefore we can use any of the values where the weight for y2 is more eg. [0.3, 0.7].

TensorFlow and Keras -1

Q1. Binary classification

In order to perform binary classification on a dataset (class 0 and 1) using a neural network, which of the options is correct regarding the outcomes of code snippets a and b? Here the labels of observation are in the form : [0, 0, 1...].

Common model:

import tensorflow
from keras.models import Sequential
from keras.layers import Dense
from tensorflow.keras.optimizers import SGD
model = Sequential()
model.add(Dense(50, input_dim=2, activation='relu', kernel_initializer='he_uniform'))
opt = SGD(learning_rate=0.01)

Code snippet a:

model.add(Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy', optimizer=opt, metrics=['accuracy'])

Code snippet b:

mode.add(Dense(1, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer=opt, metrics=['accuracy'])

The term "Required results" in the options means that the accuracy of the model should be above 60%.

Note: 40% of the dataset is from class 0.

Choose the correct answer from below:

A. Both a and b will give required results.

B. Only b will give the required results.

C. Only a will give the required results.

D. Both a and b will fail to give required results.

Ans: C

Correct option: only a will give the required results.

Explanation :

The task requires that the output layer is configured with a single node and a ‘sigmoid‘ activation function in order to predict the probability for the required class. For applying the softmax function for binary classification, the output layer should have 2 neurons for predicting the probability of the two classes individually.

In order to get the required results using the softmax function we need to have 2 neurons in the output layer and also the labels should be in one-hot encoded format.

Q2. Sequential classification model

import numpy as np
from tensorflow.keras import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import Dense, Dropout, Activation
from tensorflow.keras.optimizers import SGD

model = Sequential()
model.add(Dense(64, activation = 'y', input_dim=50))
model.add(Dense(64, activation = 'y'))
model.add(Dense(x, activation = 'z'))

model.compile(loss ='categorical_crossentropy',
optimizer = SGD(lr = 0.01),
metrics = ['accuracy'])

model.fit(X_train, y_train,
epochs=20)

Ram wants to create a model for the classification of types of malware in 10 different categories. He asked for help from Shyam, and he helped him with the incomplete code as shown above in the snippet. Help Ram in completing the code for classification if the data used has 50 input features. Choose the best-suited option for filling out x, y, and z.

Choose the correct answer from below:

A. x = len(np.unique(y_train)), y = softmax, z = softmax

B. x = 2 * len(np.unique(y_train)), y = relu, z = relu

C. x = len(np.unique(y_train)), y = relu, z = softmax

D. x = 0.5 * len(np.unique(y_train)), y = relu, z = relu

Ans: C

Correct option :

x = len(np.unique(y_train))
y = relu
z = softmax

Explanation :

z : For multiclass classification, softmax activation is used.
x : For the softmax activation, the output layer has the same number of neurons as the number of different classes.

y : ReLu activation function can definitely be used in the intermediate layers. ReLU is not used in the output layer of classification. Because of it's unbounded range, it's difficult to determine thresholds. Though ReLu can be used in regression tasks where negative values don't make sense like predicting prices.

Q3. Multi target output

For a multi-output regression model:

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers

def get_model(n_inputs):
model = keras.Sequential()
model.add(Dense(20, input_dim = n_inputs, kernel_initializer='he_uniform', activation='relu'))
model.add(______)
model.compile(loss = 'mae', optimizer = 'adam')
return model

We want to build a neural network for a multi-output regression problem. For each observation, we have 2 outputs. Complete the code snippet to get the desired output.

Choose the correct answer from below:

A. Dense(2)

B. Dense(3)

C. activation('sigmoid')

D. activation('relu')

Ans: A

Correct option: Dense(2).

Explanation:
As we have 2 outputs therefore our output layer of model should have 2 neurons.

Q4. Number of parameters

Consider the following neural network model :

model = Sequential()
model.add(Dense(12, input_dim=8, activation='relu'))
model.add(Dense(1, activation='sigmoid'))

The number of parameters in this model is:

Choose the correct answer from below:

A. 120

B. 96

C. 108

D. 121

Ans: D

Correct option : 121

Explanation :

Number of nodes in the input layer(i) = 8
Number of nodes in the hidden layer(h) = 12
Number of nodes in the output layer(o) = 1
So,
Number of parameters = (8×12+12×1)+12+1 = 121

Q5. Model summary

Complete the following code snippet in order to get a model with the attached model summary.

import tensorflow as tf
model = tf.keras.models.Sequential()

# Create model
model.add(tf.keras.layers.Input(shape=(_a_, )))
model.add(tf.keras.layers._b_( 512 , activation='relu'))
model.add(tf.keras.layers.Dense( _c_, activation='softmax'))

model.summary()

Choose the correct answer from below:

A. a - 32, b - Dense, c - 10

B. a - 12, b - Dense, c - 10

C. a - 10, b - Dense, c - 5

D. a - Dense(33), b - Dense, c – 50

Ans: Correct Option:
a - 32, b - Dense, c - 10

Explanation:

The key for getting a is that in the first layer we will have the number of parameters equal to (no. of features in input * neurons in the first layer) + neurons in the first layer, i.e. 32 x 512 + 512 = 16896
As from the first layer, we got the info from summary as dense. Similarly, for the second layer (i.e. c), we can get the number of neurons from output shape from dense_1.

Q6. Logistic regression model

Which of these neural networks would be most appropriately representing a logistic regression model structure for binary classification?

model = Sequential()
model.add(Dense(units=32 input_shape=(2,), activation = ‘relu’))
model.add(Dense(units=64, activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.add(Dense(units=1, input_shape=(2,), activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

model = Sequential()
model.add(Dense(units=16))
model.add(Dense(units=32, activation=’relu’))
model.add(Dense(units=64,activation='sigmoid'))
model.compile(optimizer='adam', loss='binary_crossentropy', metrics=['accuracy'])

Choose the correct answer from below:

A. a

B. b

C. c

D. d

Ans: B

Correct Option: b

Explanation:

Option B would be most appropriate for representing a logistic regression model structure for binary classification. This is because it has a single input layer with only one neuron and a sigmoid activation function. The sigmoid function maps the output to a probability value between 0 and 1, which is ideal for binary classification problems.
Option A has two layers, with the second layer using the sigmoid activation function. While this could work for binary classification, the use of the ReLU activation function in the first layer is more commonly used in multi-class classification problems.
Option C has two sigmoid layers, which would be more appropriate for a deeper neural network structure for more complex problems.
Option D has a similar structure to Option A, with an additional hidden layer. While this could also work for binary classification, the use of ReLU activation in the second layer may make it more suitable for multi-class classification problems.

Q7. Model hyperparameters

Complete the following model to get the training output attached to the image.

model.compile(optimizer='sgd',
loss='sparse_categorical_crossentropy',
metrics=[‘_a_’])

# train model
model.fit(x=X_train,
          y=y_train,
          epochs = _b_ ,
          validation_data=(X_test, y_test))

Choose the correct answer from below:

A. a - loss, b - 5

B. a - accuracy, b - 100

C. a - loss, b - 25

D. a - val_acc, b - 100

Ans: B

Correct option:
a - accuracy, b - 100

Explanation:
As the image shows the accuracy, therefore metrics has to be accuracy.
Also in the image, the no. of epochs is showing 100.

Q8. Model prediction

We want to use our trained binary classification (trained with binary cross entropy and sigmoid activation function) model 'model', in order to get the label for the first observation in our test dataset of shape (m x n).

Mark the correct option which has the code to meet our requirements.

Note: m represents the number of observations and n represents the number of independent variables.

Choose the correct answer from below:

A. model.predict(test_data[0])

B. 1 if model.predict(test_data[0].reshape(1,-1)) < 0.5 else 0

C. model.predict(test_data[0].reshape(1,-1))

D. 1 if model.predict(test_data[0].reshape(1,-1)) > 0.5 else 0

Ans: D

Correct Answer: 1 if model.predict(test_data[0].reshape(1,-1)) > 0.5 else 0

Explanation:

As the model is trained with sigmoid activation function it’ll give the output with probability between 0 and 1, therefore we need to use the ternary operator.
Also we need to reshape the test_data[0] otherwise the api will throw an error mentioning reshaping the data if it has single sample.

Machine Learning - Deep Learning

Neural network 4

TensorFlow & keras 3

TensorFlow and Keras -1

About Machine Learning

SOFTWARE ENGINEERING