Neural Network 2

Q1. Neuron

Which of the following is true about a single artificial neuron?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     It is loosely inspired from biological neurons

B.     It computes weighted sum

C.      It applies an activation function

D.     It is capable of performing multi class classification

Ans: A,B,C

Correct Options:-

  • It is loosely inspired from biological neurons
  • It computes weighted sum
  • It applies an activation function

Explanation:-

  • The basic inspiration for artificaial neurons did come from biological neurons.
    Biological neurons form a network a network within themselves.
    Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
    An artificial neuron receives signals then processes them and can signal neurons connected to it.
  • A neuron does it’s computation in 2 steps:
    1. First it computes the weighted sum as: z=w1​x1​+w2​x2​+…+wdxd​+b
    2. Then it applies an activation function on top of this sum: a=f(z)
  • A single neuron can perform binary classification if it’s activation is the sigmoid function.
    However, it cannot perform muti-class classification on it’s own. We would need a network of multiple neurons to do that.

Q2. Sigmoid and softmax functions

Which of the following statements is true for a neural network having more than one output neuron ?

Choose the correct answer from below:

A.     In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is always 1.

B.     In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is 1 if and only if we have just two output neurons.

C.      In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

D.     The softmax function is a special case of the sigmoid function

Ans: C

Correct option : In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

Explanation :

  • For the sigmoid activation, when we have more than one neuron, it is possible to have the sum of outputs from the neurons to have any value.
  • The softmax classifier outputs the probability distribution for each class, and the sum of the probabilities is always 1.
  • The Sigmoid function is the special case of the Softmax function where the number of classes is 2.

 

Q3. Forward propagation

Given the independent and dependent variables in X and y, complete the code to calculate the results of the forward propagation for a single neuron on each observation of the dataset.

The code should print the calculated labels for each observation of the given dataset i.e. X.

Input Format:

Two lists are taken as the inputs. First list should be the independent variable(X) and the second list should be the dependent variable(y)

Output Format:

A numpy array consisting of labels for each observation.

Sample Input:

X = [[100, 129, 157, 133], [168, 150, 30, 19], [4, 148, 106, 74], [123, 195, 60, 93], [169, 40, 188, 179], [40, 59, 29, 94], [165, 126, 16, 99], [167, 157, 65, 23], [128, 87, 37, 111], [191, 154, 89, 134], [101, 41, 145, 112], [43, 110, 197, 118], [147, 22, 109, 139], [11, 161, 135, 119], [26, 48, 199, 182], [96, 100, 82, 87], [149, 2, 8, 10], [5, 38, 166, 100], [193, 117, 59, 164], [133, 5, 38, 163], [88, 177, 84, 114], [9, 132, 177, 24], [94, 130, 83, 131], [77, 11, 141, 81], [154, 198, 175, 98], [21, 148, 170, 122], [185, 145, 101, 183], [100, 196, 111, 11], [97, 147, 112, 11], [25, 97, 95, 45], [6, 89, 88, 38], [51, 16, 151, 3], [90, 174, 122, 157], [2, 133, 121, 199], [15, 78, 163, 180], [103, 118, 7, 179], [102, 179, 157, 183], [113, 139, 195, 122], [55, 88, 68, 117], [115, 185, 93, 102], [139, 82, 3, 165], [135, 29, 78, 11], [11, 16, 60, 123], [103, 191, 187, 129], [146, 181, 28, 192], [85, 73, 136, 139], [117, 179, 81, 183], [15, 131, 106, 28], [58, 78, 111, 65], [76, 11, 25, 103], [11, 90, 162, 129], [144, 1, 16, 33], [33, 172, 40, 72], [106, 83, 160, 151], [68, 159, 150, 64], [31, 79, 83, 15], [51, 140, 173, 10], [105, 80, 70, 21], [195, 80, 64, 129], [50, 96, 107, 82], [185, 150, 15, 143], [28, 71, 27, 57], [58, 13, 146, 78], [20, 71, 183, 44], [91, 44, 15, 87], [77, 157, 95, 110], [132, 28, 193, 49], [177, 87, 57, 41], [194, 175, 17, 20], [166, 64, 134, 150], [79, 74, 162, 168], [166, 149, 34, 117], [160, 170, 127, 44], [99, 41, 103, 155], [48, 127, 138, 68], [17, 3, 101, 94], [29, 102, 123, 158], [194, 60, 135, 179], [73, 192, 145, 168], [21, 94, 154, 143], [17, 10, 145, 131], [73, 29, 195, 199], [132, 189, 90, 100], [134, 32, 81, 119], [118, 37, 119, 27], [51, 78, 187, 86], [95, 8, 56, 29], [156, 162, 186, 127], [126, 111, 144, 59], [7, 140, 32, 75], [40, 0, 109, 92], [165, 175, 61, 103], [178, 68, 185, 119], [132, 105, 36, 80], [165, 117, 35, 176], [128, 49, 185, 9], [50, 176, 12, 198], [124, 164, 99, 102], [36, 30, 114, 147], [166, 172, 35, 14]]
y = [1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0]

Sample Output:

[1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1]


 import numpy as np

np.random.seed(2)

 

#independent variables

X = np.array(eval(input()))

#dependent variable

y = np.array(eval(input()))

 

m = X.shape[__]  #no. of samples

n = X.shape[__]  #no. of features

c =    #no. of classes in the data and therefore no. of neurons in the layer

 

#weight vector of dimension (number of features, number of neurons in the layer)

w = np.random.randn(___, ___)

 

#bias vector of dimension (1, number of neurons in the layer)

b = np.zeros((___, ___))

 

#(weighted sum + bias) of dimension (number of samples, number of classes)

z = ____

 

#exponential transformation of z

a = np.exp(z)

 

#Perform the softmax on a

a = ____

 

#calculate the label for each observation

y_hat = ____

 

print(y_hat)

Ans:

import numpy as np

np.random.seed(2)

 

#independent variables

X = np.array(eval(input()))

#dependent variable

y = np.array(eval(input()))

‘m’ and ‘n’ refers to the no. of rows and columns in the dataset respectively.’c’ refers to the number of classes in y.

m = X.shape[0]  #no. of samples

n = X.shape[1]  #no. of features

c = len(np.unique(y))   #no. of classes in the data and therefore no. of neurons in the layer

Initializing weights randomly

#weight vector of dimension (number of features, number of neurons in the layer)

w = np.random.randn(n, c)

Initializing biases as zero

#bias vector of dimension (1, number of neurons in the layer)

b = np.zeros((1, c))

Finding the output ‘z’

#(weighted sum + bias) of dimension (number of samples, number of classes)

z = np.dot(X, w) + b

Applying the softmax activation function on the output

#exponential transformation of z

a = np.exp(z)

a = a/np.sum(a, axis = 1, keepdims = True)

Calculating the label for each observation

y_hat = np.argmax(a, axis = 1)

print(y_hat)

 

Q4. Same layer still different output

Why do two neurons in the same layer produce different outputs even after using the same kind of function (i.e. wT.x + b)?

Choose the correct answer from below:

A.     Because the weights are not the same for the neurons.

B.     Because the input for each neuron is different.

C.      Because weights of all neurons are updated using different learning rates.

D.     Because only biases (b) of all neurons are different, not the weights.

Ans: C

 

Correct option: Because the weights are not the same for the neurons.

Explanation :

  • The weights for each neuron in a layer are different. Thus the output of each neuron ( wT.x + b ) will be different
  • The input for each neuron in a layer is the same. In a fully connected network, each neuron in a particular layer gets inputs from each neuron in the previous layer.
  • There may be different learning rates for each model weight depending on the type of optimizer used, but that is not the reason for the neurons to give different outputs.
  • In a fully connected network, each neuron has two trainable parameters : a bias and a weight. The values of bias and weight for any two neurons in a layer need not be the same, since they keep changing during the model training.

 

Q5. Will he watch the movie?

We want to predict whether a user would watch a movie or not. Each movie has a certain number of features, each of which is explained in the image.









Now take the case of the movie Avatar having the features vector as [9,1,0,5]. According to an algorithm, these features are assigned the weights [0.8,0.2,0.5,0.4] and bias=-10. For a user X, predict whether he will watch the movie or not if the threshold value(θ) is 10?

Note: If the output of the neuron is greater than θ then the user will watch the movie otherwise not.

 

Choose the correct answer from below:

A.     Yes, the user will watch the movie with neuron output = -0.6

B.     No, the user will not watch the movie with neuron output = -0.6

C.      No, the user will watch the movie with neuron output = 2.5

D.     Yes, the user will watch the movie with neuron output = 2.5

Ans: B

Correct option :No, the user will not watch the movie with neuron output with -0.6

Explanation :

The output of a neuron is obtained by taking the weighted sum of inputs and adding the bias term to it.

The output of neuron is :
(0.8).(9)+(0.2).(1)+(0).(0.5)+(5).(0.4)−10 =−0.6<10,
therefore he won’t watch the movie.

Q6. And perceptron

 

We want to design a perception that performs AND operation. Refer to the table below:-






For this, first, a weighted sum is calculated: z=w1​x1​+w2​x2​+b

Here, w1​ and w2​ are weights, and b is the bias for the neuron.

The activation function applied on this is as follows:-

f(z)=0, if z<0

f(z)=1, otherwise

Which of the following values of weights and bias will give the desired results?

Choose the correct answer from below:

A.     W1=1, w2=1,b=-2

B.     W1=1, w2=1,b=2

C.      W1=1,w2=2,b=-2

D.     W1=2,w2=1,b=4

Ans: A

Correct answer: w1​=1,w2​=1,b=−2

Explanation:

  • For the input (0,0) the perceptron will perform calculation something like this:
    z=w1​x1​+x2​x2​+b=(1)(0)+(1)(0)+(−2)=−2
    Therefore f(z)=0
  • Similarly for the inputs (0,1) and (1,0)
    For both cases, z=−1,
    Therefore, f(z)=0
  • But for the input (1,1) the value of z=w1​.x1​+w2​.x2​+b=0
    Therefore, f(z) = 1
  • Therefore, among the options, 1,1, and -2 give the required results.
  • We can use any values for w1​,w2​, and b which satisfy our conditions and output.

 

Q7. Vectorize

Consider the following code snippet:





How do you vectorize this?

Note: All x, y and z are NumPy arrays.

Choose the correct answer from below:

A.     z = x + y

B.     z= x * y.T

C.      z = x + y.T

D.     z = x.T + y.T

 

Ans: C

orrect option : z=x+y.T

Explanation :

The shape of x is (10,6)
The shape of y is (6,1)

We can observe from the question that all elements of y are added element-wise to z, while y[j] is added to each element in the jth column in z.
For this to happen, we will have to convert the y array to shape (1,6) and then add it to x, which will result in broadcasting of y into shape (10,6) .

Thus the answer is z=x+y.T

 

 

 

 

 

 

 

Neural Network 1

 Q1. Weights impact



For the neural network shown above, which of these statements is true?

Choose the correct answer from below:

A.     -5 weight is bad for the neural network.

B.     The neuron with weight 10 will have the most impact on the output.

C.      The neuron with weight -5 will have the most impact on the output.

D.     The neuron with weight 2 will have the most impact on the output.

Ans: B

Correct option : The neuron with weight 10 will have the most impact on the output.

Explanation :
There is no such thing that a neuron with a negative weight will be bad for the output. The negative or positive weight of a neuron simply means whether it has an increasing or decreasing effect on the output value. A neuron with the largest magnitude will have the most significant effect on the output value.

Q2. Calculate Forward Pass

The neuron n has the weights 1,2,3,4, and 5. The values of inputs are 4,10,5,20, and 0. We are using a linear activation function with the constant of proportionality being equal to 2 here.





The output will be:

Choose the correct answer from below:

A.     193.5

B.     59.5

C.      238

D.     119

Ans: C

Correct option : 238

Explanation :

Multiplying weights with their corresponging inputs, and then adding everything together:

-> Output=(14+210+35+420+50)=119

But since the output is linear with a proportionality constant 2, hence :

-> Finaloutput=2119=238

 

Q3. Need for NN

Are classic ML Algorithms not powerful enough? Why exactly do we need to use Neural Networks?

Check all that apply.

 

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     Unlike Classic ML Algos, NN does not require us to do manual feature engineering.

B.     NNs can work with both structured and unstructured data

C.      For a large dataset, classic ML Algos can outperform NNs

D.     NNs are able to work better with sparse data

Ans: A,B,D

Correct Options:-

  • Unlike Classic ML Algos, NN does not require us to do manual feature engineering.
  • NNs can work with both structured and unstructured data
  • NNs are able to work better with sparse data

Explanation:-

  • In order to create a complex decision boundary, classic ML Algorithms require us to do heavy feature engineering, manually.
    On the other hand, NN are able to find complex relations between features on their own.
  • NN are excellent in working with unstructured data (image / text / audio data), whereas classic ML algos are unable to handle this data.
  • The performance of NN might be comparable to that of classic ML Algos for small datasets.
    However, if given a big enough dataset, NNs will always give better performance.

 

Q4. Scale drives DL

Refer to the given plot.




Which of the following generally does not hurts an algorithm's performance, and may help significantly?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     Decreasing the size of a NN

B.     Increasing the size of a NN

C.      Decreasing the training set size

D.     Increasing the training set size

 

Ans:  B, D

Correct Options:-

  • Increasing the size of a NN
  • Increasing the training set size

Explanation:-

  • According to the trends in the given figure, big networks usually performs better than small networks.
  • Also, bringing more data to a NN model is almost always beneficial.

 

Q5. Factors of DL performance

Which of the following factors can help achieve high performance with Deep Learning algorithms?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     Large amount of data

B.     Smaller models

C.      Better designed features to use

D.     Large models

 

Ans: A, D

Correct Options:-

  • Large amount of data
  • Large models

Explanation:-

  • Over the last 20 years, we have accumulated a lot of data. Traditional algorithms were not able to benefit from this. Whereas, this large amount of data has been the fundamental reason why DL took off in the past decade.
  • In order to take advantage of this large amount of data available to us, we need a big enough model also.
  • Smaller models will not be able to yeild very high performace, as they will not be able to take advantage of the large amount of data.
  • One main difference between classical ML algos and DL algos is that DL models are able to “figure out” the best features using hidden layers.

 

Q6. NN true false

Mark the following statement as true or false:-

"Neural networks are good at figuring out functions, relating an input x to an output y, given enough examples."

 

Choose the correct answer from below:

A.     True

B.     False

Ans:  A

Correct Option: True

Explanation:

  • With NN, we don’t need to design features by ourselves.
  • The NN figures out the necessary relations given enough data.

 

 

 

 

 

 

 

Neural Network 5

 Q1. Backpropagation in MLP

Which of the following options are true with respect to Backpropagation?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     In backpropagation, we calculate the error contribution of each neuron.

B.     In backpropagation, we calculate the loss gradients with respect to inputs.

C.      In backpropagation, we calculate the loss gradient with respect to weights and biases.

D.     In backpropagation, we update the weights of neurons in each iteration.

Ans: A, C,D

Correct options :

i) In backpropagation, we calculate the error contribution of each neuron.

ii) In backpropagation, we calculate the loss gradient with respect to weights and biases.

iii) In backpropagation, we update the weights of neurons in each iteration.

Explanation :

Only this statement is false “It is used to calculate the loss gradients with respect to inputs.” as we can’t update the inputs. All the other options are true.

Backpropagation calculates the rates with which loss changes with respect to weights and biases and then weights and biases are updated inorder to minimize the loss function.

Q2. Complete the updating code




We want to use the above code snippet for a simple binary classification task where if the model() returns 1 for an observation, then the observation will be classified as '+'(1) otherwise '-'(0).

model() will return 1 only if the weighted sum is greater than or equal to the threshold thresh. In the above code snippet, the fit function will be used for getting a weight matrix for the classification task.

Complete the updating syntax for weights [?] and threshold [??].


Note: 
The inputs are always positive

Choose the correct answer from below:

A.     w = w + lr * x, thresh = thresh + lr

B.     w = w - lr * x, thresh = thresh + lr

C.      w = w + lr * x, thresh = thresh - lr * x.

D.     w = w - lr * x, thresh = thresh - lr * x.

Ans: A

Correct Answer:

  • w = w + lr * x, thresh = thresh + lr

Explanation

The code snippet is basically a simple version of the implementation of perceptron where :

  • we are updating the weights and threshold with the same intuition as in SGD but without formulation.

If the expected output is ”+” and the predicted one is ”-“, then :

  • we should increase the weights in order to increase the weighted sum i.e. w.x
  • and decrease the threshold (thresh)

If the expected output is ”-“ and the predicted one is ”+”, then :

  • we should decrease the weights in order to decrease the weighted sum
  • and increase the threshold (thresh)

The code for model() is as follows:

def model(x,w,thresh):

  return 1 if (np.dot(w, x) >= thresh) else 0

Q3. Weight's value

Consider a neural network as shown in the image below:




The initial values of x1,x2 and x3 are [10,5,5]. The true value of output is 4. If the loss function is mean squared error then what is the value of w1​ after the first epoch?

Consider initial value of all w1​, w2​, w3​, w4​ and w5​ as 0.1 and the learning rate is 0.01

Choose the correct answer from below:

A.     0.550

B.     0.252

C.      0.111

D.     0.340

Ans: B

Correct option : 0.252

Explanation :

Let o1​ be the output coming out of the first neuron in the hidden layer and o2​ be the output coming out of the secon neuron in the hidden layer

Now, o1​=F(x2), where x = w1​.x1​
and o2​ = F(x), where x = w2​.x2​+w3​.x3​

o1​=w12​.x12​=(0.1).(0.1).(10).(10)=1
o2​=w2​.x2​+w3​.x3​=(0.1).(5)+(0.1).(5)=1

Similalry,
y^​=w4​.o1​+w5​.o2​=w4​.(w12​.x12​)+w5​(w2​.x2​+w3​.x3​)

According to the question, the loss function is :
loss=(yy^​)2

Using the chain rule of differentiation :

dw1​d(loss)​=do1​d(loss)​.dw1​d(o1​)​

Thus,

do1​d(loss)​=do1​d(yw4​.o1​−w5​.o2​)2​=(−2).(w4​).(yw4​.o1​−w5​.o2​)

=(−2).(0.1).(4−(0.1).(1)−(0.1).(1))=(−0.2).(4−0.2)=(−0.2).(3.8)=−0.76

Similarly,

dw1​d(o1​)​=dw1​d(w12​.x12​)​=2.w1​.x1​.x1​=(2).(0.1).(10).(10)=20

Finally,

dw1​d(loss)​=(−0.76).(20)=−15.2

Updating the weight :

w1​←w1​−α.dw1​d(loss)​

where α is the learning rate

=0.1−0.01.(−15.2)

=0.1+0.152=0.252

Q4. Convergence

Fill in the blank :

In a multi-layered perceptron architecture, gradient descent ______ .

Choose the correct answer from below:

A.     always converges to the global minimum.

B.     doesn't converge to the global minimum.

C.      may or may not converge to the global minimum.

D.     will always converge to the global minimum if the learning rate is appropriate.

Ans: C

Correct option : may or may not converge to the global minimum

Explanation :

Gradient descent may or may not converge to a global minimum depending on the initial weights and learning rate. The loss function for a multi-layered perceptron is neither convex nor concave due to which it can have multiple local minima. So, it is not guaranteed that the gradient descent will converge.

Q5. Calculate the loss

Given the dataset, calculate the loss after completing the code snippet. Blanks are [?] .

def hypothesis(w,b,x):                           #Section 1
  return 1.0/(1.0 + np.exp(-(w*x + b)))

def error(w,b):                                  #Section 2       
  err=0.0
  for x,y in zip(train,label):
    fx = hypothesis(w,b,x)
    err += 0.5 * (fx-y) ** 2
  return err

def grad_w(w,b,x,y):                              #Section 3
  fx=hypothesis(w,b,x)
  return (fx-y)*fx*(1-fx)*x

def grad_b(w,b,x,y):                              #Section 4
  fx=hypothesis(w,b,x)
  return (fx-y)*fx*(1-fx)

def gradient_descent(train,label,w,b,lr,max_epochs):    #Section 5
  dw=0
  db=0
  for i in range(max_epochs):
    for x,y in zip(train,label):
      dw+=grad_w(w,b,x,y)
      db+=grad_b(w,b,x,y)
    w = w [?] lr*dw         # [?] is here arithmetic sign
    b = b [?] lr*db         # [?] is here arithmetic sign
    print("For Epoch {}, the loss is {}".format(i+1, error(w,b)))
  return w,b

df=pd.read_csv("filepath")
train=df['X']
label=df['Y']
initial_w = 1
initial_b = 1
lr=0.01
max_epochs=50
w,b = gradient_descent(train,label,initial_w,initial_b,lr,max_epochs)

Choose the correct option for the loss and arithmetic signs.

df=pd.read_csv(data)

Choose the correct answer from below:

A.     0.016, -, -

B.     0.28, +, +

C.      0.028, -, -

D.     0.050, +, +

Ans: A

Correct option : 0.016, -, -

Explanation :

The rule of parameter updation in gradient descent is :

wwα.(∂L/∂w)
bbα.(∂L/∂b),
where α is the learning rate

Thus the signs in place of [?] will be -,-

The loss after 50 epochs comes out to be around 0.016

Q6. Fully connected neural network

Which, if any, of the given propositions is true about fully-connected neural networks (FCNN)?

Choose the correct answer from below:

A.     In a FCNN, there are connections between neurons of a same layer.

B.     In a FCNN, the most common weight initialization scheme is the zero initialization, because it leads to faster and more robust training.

C.      The neurons of one layer are connected to every neuron of its preceding layer.

D.     None of the options

Ans: C

Correct option : The neurons of one layer are connected to every neuron of its preceding layer.

Explanation :

  • In a FCNN, Neurons of one layer are connected to every neuron of its preceding layer, But there are no connections between neurons of the same layer.
  • Zero initialization leads to weight symmetry and undermines training. ( If all the weights are the same, then they will all receive the same update in each training round, so no learning can occur )

Q7. Compare the learnings

The following graph shows the learning speeds versus the number of epochs for the four hidden layers where Hidden layer 1 and Hidden layer 4 are the first and last hidden layers respectively.





Considering the graph mark the correct option.

Choose the correct answer from below:

A.     The initial layers learn slower since the weights of the initial layers are always higher

B.     The gradients for initial layers are smaller than those of later ones, causing slow learning in them

C.      The slow learning in initial layers is due to the dead activation of sigmoid in the initial layers

D.     The slow learning in initial layers is due to faster learning at initial stages

Ans: B

Correct option: The gradients for initial layers are smaller than those of later ones, causing slow updating of weights and biases in them hence slow learning.

Explanation:

  • Recall that the backpropagation algorithm involves heavy usage of the chain rule, which means that the gradient received by the initial layers is represented as a long series of multiplications. During backpropagation, If the gradients at the final layers are fairly small, the initial layers will get a very tiny gradient. Hence, weights will get updated very slowly in these layers.
  • One option suggests that the initial layers don't require much updating which is wrong as the weights are initialized randomly and they do require significant updating. Therefore it can't be true that the earlier layers require lesser updating.
  • One option suggests that the initial layers learn slower later because they learn faster in the initial epochs which is not true.
  • One option says it's due to dead activation in the initial layers which is not true, there isn't any such concept.

About Machine Learning

Welcome! Your Hub for AI, Machine Learning, and Emerging Technologies In today’s rapidly evolving tech landscape, staying updated with the ...