Neural Network 5
Q1. Backpropagation in MLP
Which of the following options
are true with respect to Backpropagation?
Choose the correct answer from
below, please note that this question may have multiple correct answers
A. In
backpropagation, we calculate the error contribution of each neuron.
B. In
backpropagation, we calculate the loss gradients with respect to inputs.
C. In
backpropagation, we calculate the loss gradient with respect to weights and
biases.
D. In
backpropagation, we update the weights of neurons in each iteration.
Ans: A, C,D
Correct options :
i) In backpropagation, we
calculate the error contribution of each neuron.
ii) In backpropagation, we
calculate the loss gradient with respect to weights and biases.
iii) In backpropagation, we
update the weights of neurons in each iteration.
Explanation :
Only this statement is false “It
is used to calculate the loss gradients with respect to inputs.” as we can’t
update the inputs. All the other options are true.
Backpropagation calculates the
rates with which loss changes with respect to weights and biases and then
weights and biases are updated inorder to minimize the loss function.
Q2. Complete
the updating code
We want to use the above code
snippet for a simple binary classification task where if the model() returns 1 for
an observation, then the observation will be classified as '+'(1)
otherwise '-'(0).
model() will return 1 only
if the weighted sum is greater than or equal to the threshold thresh. In
the above code snippet, the fit function will be used for
getting a weight matrix for the classification task.
Complete the updating syntax for
weights [?] and threshold [??].
Note: The inputs are always positive
Choose the correct answer from
below:
A. w
= w + lr * x, thresh = thresh + lr
B. w
= w - lr * x, thresh = thresh + lr
C. w
= w + lr * x, thresh = thresh - lr * x.
D. w
= w - lr * x, thresh = thresh - lr * x.
Ans: A
Correct Answer:
- w = w + lr * x, thresh = thresh + lr
Explanation
The code snippet is basically a
simple version of the implementation of perceptron where :
- we are updating the weights and threshold with the
same intuition as in SGD but without formulation.
If the expected output is ”+” and
the predicted one is ”-“, then :
- we should increase the weights in
order to increase the weighted sum i.e. w.x
- and decrease the threshold
(thresh)
If the expected output is ”-“ and
the predicted one is ”+”, then :
- we should decrease the weights in
order to decrease the weighted sum
- and increase the threshold
(thresh)
The code for model() is
as follows:
def model(x,w,thresh):
return 1 if (np.dot(w, x) >= thresh) else 0
Q3. Weight's
value
Consider a neural network as
shown in the image below:
The initial values of x1,x2 and x3 are [10,5,5].
The true value of output is 4. If the loss function is mean squared error then
what is the value of w1 after the first epoch?
Consider initial value of
all w1, w2, w3, w4 and w5 as
0.1 and the learning rate is 0.01
Choose the correct answer from
below:
A. 0.550
B. 0.252
C. 0.111
D. 0.340
Ans: B
Correct option : 0.252
Explanation :
Let o1 be the
output coming out of the first neuron in the hidden layer and o2 be
the output coming out of the secon neuron in the hidden layer
Now, o1=F(x2),
where x = w1.x1
and o2 = F(x), where x = w2.x2+w3.x3
o1=w12.x12=(0.1).(0.1).(10).(10)=1
o2=w2.x2+w3.x3=(0.1).(5)+(0.1).(5)=1
Similalry,
y^=w4.o1+w5.o2=w4.(w12.x12)+w5(w2.x2+w3.x3)
According to the question, the
loss function is :
loss=(y−y^)2
Using the chain rule of
differentiation :
dw1d(loss)=do1d(loss).dw1d(o1)
Thus,
do1d(loss)=do1d(y−w4.o1−w5.o2)2=(−2).(w4).(y−w4.o1−w5.o2)
=(−2).(0.1).(4−(0.1).(1)−(0.1).(1))=(−0.2).(4−0.2)=(−0.2).(3.8)=−0.76
Similarly,
dw1d(o1)=dw1d(w12.x12)=2.w1.x1.x1=(2).(0.1).(10).(10)=20
Finally,
dw1d(loss)=(−0.76).(20)=−15.2
Updating the weight :
w1←w1−α.dw1d(loss)
where α is the
learning rate
=0.1−0.01.(−15.2)
=0.1+0.152=0.252
Q4. Convergence
Fill in the blank :
In a multi-layered perceptron architecture, gradient descent ______ .
Choose the correct answer from
below:
A. always
converges to the global minimum.
B. doesn't
converge to the global minimum.
C. may
or may not converge to the global minimum.
D. will
always converge to the global minimum if the learning rate is appropriate.
Ans: C
Correct option : may or
may not converge to the global minimum
Explanation :
Gradient descent may or may not converge to a global minimum depending on the
initial weights and learning rate. The loss function for a multi-layered
perceptron is neither convex nor concave due to which it can have multiple
local minima. So, it is not guaranteed that the gradient descent will converge.
Q5. Calculate
the loss
Given the dataset, calculate
the loss after completing the code snippet. Blanks are [?] .
def hypothesis(w,b,x): #Section 1
return 1.0/(1.0 + np.exp(-(w*x + b)))
def error(w,b):
#Section 2
err=0.0
for x,y in zip(train,label):
fx = hypothesis(w,b,x)
err += 0.5 * (fx-y) ** 2
return err
def grad_w(w,b,x,y): #Section 3
fx=hypothesis(w,b,x)
return (fx-y)*fx*(1-fx)*x
def grad_b(w,b,x,y): #Section 4
fx=hypothesis(w,b,x)
return (fx-y)*fx*(1-fx)
def gradient_descent(train,label,w,b,lr,max_epochs): #Section 5
dw=0
db=0
for i in range(max_epochs):
for x,y in zip(train,label):
dw+=grad_w(w,b,x,y)
db+=grad_b(w,b,x,y)
w = w [?] lr*dw # [?] is here arithmetic sign
b = b [?] lr*db # [?] is here arithmetic sign
print("For Epoch {}, the loss is
{}".format(i+1, error(w,b)))
return w,b
df=pd.read_csv("filepath")
train=df['X']
label=df['Y']
initial_w = 1
initial_b = 1
lr=0.01
max_epochs=50
w,b = gradient_descent(train,label,initial_w,initial_b,lr,max_epochs)
Choose the correct option for
the loss and arithmetic signs.
df=pd.read_csv(data)
Choose the correct answer from
below:
A. 0.016,
-, -
B. 0.28,
+, +
C. 0.028,
-, -
D. 0.050,
+, +
Ans: A
Correct option : 0.016,
-, -
Explanation :
The rule of parameter updation in
gradient descent is :
w←w−α.(∂L/∂w)
b←b−α.(∂L/∂b),
where α is the learning rate
Thus the signs in place of [?] will
be -,-
The loss after 50 epochs comes
out to be around 0.016
Q6. Fully
connected neural network
Which, if any, of the given
propositions is true about fully-connected neural networks (FCNN)?
Choose the correct answer from
below:
A. In
a FCNN, there are connections between neurons of a same layer.
B. In
a FCNN, the most common weight initialization scheme is the zero
initialization, because it leads to faster and more robust training.
C. The
neurons of one layer are connected to every neuron of its preceding layer.
D. None
of the options
Ans: C
Correct option : The
neurons of one layer are connected to every neuron of its preceding layer.
Explanation :
- In a FCNN, Neurons of one layer are connected to
every neuron of its preceding layer, But there are no connections between
neurons of the same layer.
- Zero initialization leads to weight symmetry and
undermines training. ( If all the weights are the same, then they will all
receive the same update in each training round, so no learning can occur )
Q7. Compare
the learnings
The following graph shows the
learning speeds versus the number of epochs for the four hidden layers where
Hidden layer 1 and Hidden layer 4 are the first and last hidden layers
respectively.
Considering the graph mark the
correct option.
Choose the correct answer from
below:
A. The
initial layers learn slower since the weights of the initial layers are always
higher
B. The
gradients for initial layers are smaller than those of later ones, causing slow
learning in them
C. The
slow learning in initial layers is due to the dead activation of sigmoid in the
initial layers
D. The
slow learning in initial layers is due to faster learning at initial stages
Ans: B
Correct option: The
gradients for initial layers are smaller than those of later ones, causing slow
updating of weights and biases in them hence slow learning.
Explanation:
- Recall that the backpropagation algorithm involves
heavy usage of the chain rule, which means that the gradient received by
the initial layers is represented as a long series of multiplications.
During backpropagation, If the gradients at the final layers are fairly
small, the initial layers will get a very tiny gradient. Hence, weights
will get updated very slowly in these layers.
- One option suggests that the initial layers don't
require much updating which is wrong as the weights are initialized
randomly and they do require significant updating. Therefore it can't be
true that the earlier layers require lesser updating.
- One option suggests that the initial layers learn
slower later because they learn faster in the initial epochs which is not
true.
- One option says it's due to dead activation in the
initial layers which is not true, there isn't any such concept.
Comments
Post a Comment