Neural network 4
Q1. Tanh and Leaky ReLu
Which of the following statements with respect to Leaky ReLu
and Tanh are true?
a. When the derivative becomes zero in the case
of negative values in ReLu, no learning happens which is rectified in Leaky
ReLu.
b. Tanh is a zero-centered activation function.
c. Tanh produces normalized inputs for the next
layer which makes training easier.
d. Tanh also has the vanishing gradient problem.
Choose the correct answer from below:
A. All the mentioned statements are true.
B. All the mentioned statements are true except c.
C. All the mentioned statements are true except b.
D. All the mentioned statements are true except d.
Ans: A
Correct options: All the mentioned statements are
true.
Explanation :
1) The problem of no learning in the case of ReLu is called dying ReLu which
Leaky ReLu takes care of.
2) Yes, tanh is a zero-centered activation function.
3) As the Tanh is symmetric and the mean is around zero it produces normalized
inputs( between -1 and 1 ) to the next layer which makes the training easier.
4) As Tanh is also a sigmoidal function it also faces the vanishing gradient
problem.
Q2. Dog and cat classifier
You are building a binary classifier for recognizing dogs
(y=1) vs. cats (y=0). Which one of these is the best activation function for
the output layer?
Choose the correct answer from below:
A. ReLU
B. Leaky ReLU
C. sigmoid
D. Tanh
Ans: C
Correct option : sigmoid
Explanation : Sigmoid function outputs a value between 0 and 1 which
makes it a very good choice for binary classification. You can classify as 0 if
the output is less than 0.5 and classify as 1 if the output is more than 0.5.
We can also change this threshold value.
It can be done with tanh as well but it is less convenient as the output is
between -1 and 1.
Q3. Maximum value of
derivates
This shows two columns one showing activation functions and the other showing the maximum value of the first-order derivatives. Map the function to the correct value on the right that shows the maximum value of their derivative.
Choose the correct answer from below:
A. 1-d, 2-c, 3-b, 4-a
B. 1-b, 2-c, 3-d, 4-a
C. 1-c, 2-b, 3-d, 4-a
D. 1-b, 2-d, 3-d, 4-d
Ans: D
Correct option : 1-b, 2-d, 3-d, 4-d.
Explanation :
The derivative of the sigmoid function is sigmoid(x)(1−sigmoid(x)),
the maximum value of this is 0.25 at sigmoid(x)=0.5 and x=0.
The derivative of tanh is 1−tanh2,
whose maximum is at tanh(x)=0 and x=0,
which is 1.
The derivative for all positive values in ReLu is 1 and 0
for all negative values of x.
The derivative for all positive values in LeakyReLu is 1. In
case of negative values, let’s say LeakyReLu outputs 0.5*(input), therefore the
slope will be 0.5, and hence the derivative will also be 0.5.
For both ReLU and leaky ReLU, the maximum derivative value is 1.
Q4. Leaky relu advantages
What are the advantages of using Leaky Rectified Linear
Units (Leaky ReLU) over normal ReLU in deep learning?
Choose the correct answer from below, please note that
this question may have multiple correct answers
A. It fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts.
B. Leaky ReLU always slows down training.
C. It increases the “dying ReLU” problem, as it doesn’t have zero-slope parts.
D. Leaky ReLU help the gradients flow easier through the architecture.
Ans: A, D
Correct options:
- It
fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts
- Leaky
ReLU helps the gradients flow easier through the architecture.
Explanation:
- Leaky
ReLU is a variant of the ReLU activation function, which is commonly used
in deep learning. The key advantage of using Leaky ReLU over normal ReLU
is that it can avoid the "dying ReLU" problem, which occurs when
a large number of neurons in a network become inactive and stop responding
to inputs.
- As
for the impact on training, it depends on the context and specific problem
you are trying to solve. In some cases, it has been observed that using
Leaky ReLU can speed up the training process by preventing the dying ReLU
problem, this could be useful when you have sparse data, or the dataset is
highly imbalanced. On the other hand, in other cases, it may slow down the
training process by introducing more complex non-linearity to the model
which results in more difficult optimization process.
- This
can happen when the input to a neuron is negative and the ReLU activation
function is used, since ReLU sets negative inputs to zero. In contrast,
Leaky ReLU allows a small, non-zero gradient for negative input values,
which can help prevent neurons from becoming inactive and can improve the
overall performance of the network, and make the gradients flow easier
through the architecture.
- Additionally,
Leaky ReLU has been shown to outperform other variants of ReLU on some
benchmarks, so it may be a better choice in some cases.
Q5. No Activation Function
What if we do not use any activation function(s) between the
hidden layers in a neural network?
Choose the correct answer from below:
A. It will still capture non-linear relationships.
B. It will just be a simple linear equation.
C. It will not affect.
D. Can't be determined.
Ans: B
Correct option : It will just be a simple linear
equation.
Explanation :
The main aim of this question is to understand why we need
activation functions in a neural network.
Following are the steps performed in a nerual network:
Step 1: Calculate the sum of all the inputs (X) according to
their weights and include the bias term:
Z=(weights∗X)+bias
Step 2: Apply an activation function to calculate the
expected output:
Y=Activation(Z)
Steps 1 and 2 are performed at each layer. This is a forward
propagation.
Now, what if there is no activation function?
Our equation for Y becomes:
Y=Z=(weights∗X)+bias
This is just a simple linear equation. A linear equation
will not be able to capture the complex patterns in the data.
To capture non-linear relationships, we use activation functions.
Q6. Trainable parameters
What is the number of trainable parameters in the neural
network given below:
Note: The network is not fully connected and the trainable parameters include biases as well.
Choose the correct answer from below:
A. 17
B. 15
C. 10
D. 20
Ans:B
Correct option : 15
Explanation :
The network is not fully connected, and hence the weight
terms can be seen by the connections between neurons which are 10 in total.
For biases, we have 4 for neurons in the hidden layer and 1 for the neuron in
the output layer which in total gives us 15.
Note : The network shown in the image is purely
for teaching purposes. We won’t encounter any neural networks like these in the
real life.
Q7. Number of connections
The number of nodes in the input layer of a fully connected
neural network is 10 and the hidden layer is 7.
The maximum number of connections from the input layer to the hidden layer are
:
Choose the correct answer from below:
A. 70
B. less than 70
C. more than 70
D. It is an arbitrary value
Ans: A
Correct option : 70.
Explanation :
- Since
MLP is a fully connected directed graph, the maximum number of connections
is product of the number of nodes in the input layer and hidden layer.
- The
total number of connections = 10.7 = 70
For a neural network consisting of an input layer, 2 hidden
layers, and one output layer, what will be the number of parameters if each
layer is dense and has a bias associated with it?
Choose the correct answer from below:
A. 24
B. 44
C. 51
D. 32
Correct option : 44
Explanation :
The no. of parameters in a fully connected neural network is given by : (i×h
+ h×o) + (h+o)
where i = number of neurons in the input layer
h = number of neurons in the hidden layer
o = number of neurons in the output layer
For the input layer each of the 5 inputs are connected to each of the 3 units
of the first hidden layer, therefore there will be 5 x 3+ 3(for bias of each
unit in the hidden layer) parameters for the first layer.
For the second hidden layer, 3 x 4 + 4 = 16
For the output layer, 4 x 2 + 2 = 10
Therefore total = 18 + 16 + 10 = 44.
Comments
Post a Comment