Machine Learning - Deep Learning

Neural network 4

Q1. Tanh and Leaky ReLu

Which of the following statements with respect to Leaky ReLu and Tanh are true?

a. When the derivative becomes zero in the case of negative values in ReLu, no learning happens which is rectified in Leaky ReLu.

b. Tanh is a zero-centered activation function.

c. Tanh produces normalized inputs for the next layer which makes training easier.

d. Tanh also has the vanishing gradient problem.

Choose the correct answer from below:

A.     All the mentioned statements are true.
B.     All the mentioned statements are true except c.
C.      All the mentioned statements are true except b.
D.     All the mentioned statements are true except d.
Ans: A

Correct options: All the mentioned statements are true.

Explanation :

1) The problem of no learning in the case of ReLu is called dying ReLu which Leaky ReLu takes care of.

2) Yes, tanh is a zero-centered activation function.

3) As the Tanh is symmetric and the mean is around zero it produces normalized inputs( between -1 and 1 ) to the next layer which makes the training easier.

4) As Tanh is also a sigmoidal function it also faces the vanishing gradient problem.

Q2. Dog and cat classifier

You are building a binary classifier for recognizing dogs (y=1) vs. cats (y=0). Which one of these is the best activation function for the output layer?

Choose the correct answer from below:

A.     ReLU
B.     Leaky ReLU
C.      sigmoid
D.     Tanh

Ans: C

Correct option : sigmoid
Explanation : Sigmoid function outputs a value between 0 and 1 which makes it a very good choice for binary classification. You can classify as 0 if the output is less than 0.5 and classify as 1 if the output is more than 0.5. We can also change this threshold value.
It can be done with tanh as well but it is less convenient as the output is between -1 and 1.

Q3. Maximum value of derivates

This shows two columns one showing activation functions and the other showing the maximum value of the first-order derivatives. Map the function to the correct value on the right that shows the maximum value of their derivative.

Choose the correct answer from below:

A.     1-d, 2-c, 3-b, 4-a
B.     1-b, 2-c, 3-d, 4-a
C.      1-c, 2-b, 3-d, 4-a
D.     1-b, 2-d, 3-d, 4-d

Ans: D

Correct option : 1-b, 2-d, 3-d, 4-d.

Explanation :

The derivative of the sigmoid function is sigmoid(x)(1−sigmoid(x)), the maximum value of this is 0.25 at sigmoid(x)=0.5 and x=0.

The derivative of tanh is 1−tanh2, whose maximum is at tanh(x)=0 and x=0, which is 1.

The derivative for all positive values in ReLu is 1 and 0 for all negative values of x.

The derivative for all positive values in LeakyReLu is 1. In case of negative values, let’s say LeakyReLu outputs 0.5*(input), therefore the slope will be 0.5, and hence the derivative will also be 0.5.

For both ReLU and leaky ReLU, the maximum derivative value is 1.

Q4. Leaky relu advantages

What are the advantages of using Leaky Rectified Linear Units (Leaky ReLU) over normal ReLU in deep learning?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     It fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts.
B.     Leaky ReLU always slows down training.
C.      It increases the “dying ReLU” problem, as it doesn’t have zero-slope parts.
D.     Leaky ReLU help the gradients flow easier through the architecture.

Ans: A, D

Correct options:

It fixes the “dying ReLU” problem, as it doesn’t have zero-slope parts
Leaky ReLU helps the gradients flow easier through the architecture.

Explanation:

Leaky ReLU is a variant of the ReLU activation function, which is commonly used in deep learning. The key advantage of using Leaky ReLU over normal ReLU is that it can avoid the "dying ReLU" problem, which occurs when a large number of neurons in a network become inactive and stop responding to inputs.

As for the impact on training, it depends on the context and specific problem you are trying to solve. In some cases, it has been observed that using Leaky ReLU can speed up the training process by preventing the dying ReLU problem, this could be useful when you have sparse data, or the dataset is highly imbalanced. On the other hand, in other cases, it may slow down the training process by introducing more complex non-linearity to the model which results in more difficult optimization process.

This can happen when the input to a neuron is negative and the ReLU activation function is used, since ReLU sets negative inputs to zero. In contrast, Leaky ReLU allows a small, non-zero gradient for negative input values, which can help prevent neurons from becoming inactive and can improve the overall performance of the network, and make the gradients flow easier through the architecture.

Additionally, Leaky ReLU has been shown to outperform other variants of ReLU on some benchmarks, so it may be a better choice in some cases.

Q5. No Activation Function

What if we do not use any activation function(s) between the hidden layers in a neural network?

Choose the correct answer from below:

A.     It will still capture non-linear relationships.
B.     It will just be a simple linear equation.
C.      It will not affect.
D.     Can't be determined.

Ans: B

Correct option : It will just be a simple linear equation.

Explanation :

The main aim of this question is to understand why we need activation functions in a neural network.

Following are the steps performed in a nerual network:

Step 1: Calculate the sum of all the inputs (X) according to their weights and include the bias term:
Z=(weights∗X)+bias

Step 2: Apply an activation function to calculate the expected output:
Y=Activation(Z)

Steps 1 and 2 are performed at each layer. This is a forward propagation.

Now, what if there is no activation function?
Our equation for Y becomes:
Y=Z=(weights∗X)+bias

This is just a simple linear equation. A linear equation will not be able to capture the complex patterns in the data.
To capture non-linear relationships, we use activation functions.

Q6. Trainable parameters

What is the number of trainable parameters in the neural network given below:

Note: The network is not fully connected and the trainable parameters include biases as well.

Choose the correct answer from below:

A.     17
B.     15
C.      10
D.     20

Ans:B

Correct option : 15

Explanation :

The network is not fully connected, and hence the weight terms can be seen by the connections between neurons which are 10 in total.
For biases, we have 4 for neurons in the hidden layer and 1 for the neuron in the output layer which in total gives us 15.

Note : The network shown in the image is purely for teaching purposes. We won’t encounter any neural networks like these in the real life.

Q7. Number of connections

The number of nodes in the input layer of a fully connected neural network is 10 and the hidden layer is 7. The maximum number of connections from the input layer to the hidden layer are :

Choose the correct answer from below:

A.     70
B.     less than 70
C.      more than 70
D.     It is an arbitrary value

Ans: A

Correct option : 70.

Explanation :

Since MLP is a fully connected directed graph, the maximum number of connections is product of the number of nodes in the input layer and hidden layer.
The total number of connections = 10.7 = 70

Q8. How many parameters?

For a neural network consisting of an input layer, 2 hidden layers, and one output layer, what will be the number of parameters if each layer is dense and has a bias associated with it?

Choose the correct answer from below:

A.     24
B.     44
C.      51
D.     32

Ans: B

Correct option : 44

Explanation :

The no. of parameters in a fully connected neural network is given by : (i×h + h×o) + (h+o)
where i = number of neurons in the input layer
h = number of neurons in the hidden layer
o = number of neurons in the output layer

For the input layer each of the 5 inputs are connected to each of the 3 units of the first hidden layer, therefore there will be 5 x 3+ 3(for bias of each unit in the hidden layer) parameters for the first layer.

For the second hidden layer, 3 x 4 + 4 = 16
For the output layer, 4 x 2 + 2 = 10
Therefore total = 18 + 16 + 10 = 44.

Neural Network 2

Q1. Neuron

Which of the following is true about a single artificial neuron?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. It is loosely inspired from biological neurons

B. It computes weighted sum

C. It applies an activation function

D. It is capable of performing multi class classification

Ans: A,B,C

Correct Options:-

It is loosely inspired from biological neurons
It computes weighted sum
It applies an activation function

Explanation:-

The basic inspiration for artificaial neurons did come from biological neurons.
Biological neurons form a network a network within themselves.
Each connection, like the synapses in a biological brain, can transmit a signal to other neurons.
An artificial neuron receives signals then processes them and can signal neurons connected to it.
A neuron does it’s computation in 2 steps:

First it computes the weighted sum as: z=w1x1+w2x2+…+wdxd+b
Then it applies an activation function on top of this sum: a=f(z)

A single neuron can perform binary classification if it’s activation is the sigmoid function.
However, it cannot perform muti-class classification on it’s own. We would need a network of multiple neurons to do that.

Q2. Sigmoid and softmax functions

Which of the following statements is true for a neural network having more than one output neuron ?

Choose the correct answer from below:

A. In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is always 1.

B. In a neural network where the output neurons have the sigmoid activation, the sum of all the outputs from the neurons is 1 if and only if we have just two output neurons.

C. In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

D. The softmax function is a special case of the sigmoid function

Ans: C

Correct option : In a neural network where the output neurons have the softmax activation, the sum of all the outputs from the neurons is always 1.

Explanation :

For the sigmoid activation, when we have more than one neuron, it is possible to have the sum of outputs from the neurons to have any value.

The softmax classifier outputs the probability distribution for each class, and the sum of the probabilities is always 1.

The Sigmoid function is the special case of the Softmax function where the number of classes is 2.

Q3. Forward propagation

Given the independent and dependent variables in X and y, complete the code to calculate the results of the forward propagation for a single neuron on each observation of the dataset.

The code should print the calculated labels for each observation of the given dataset i.e. X.

Input Format:

Two lists are taken as the inputs. First list should be the independent variable(X) and the second list should be the dependent variable(y)

Output Format:

A numpy array consisting of labels for each observation.

Sample Input:

X = [[100, 129, 157, 133], [168, 150, 30, 19], [4, 148, 106, 74], [123, 195, 60, 93], [169, 40, 188, 179], [40, 59, 29, 94], [165, 126, 16, 99], [167, 157, 65, 23], [128, 87, 37, 111], [191, 154, 89, 134], [101, 41, 145, 112], [43, 110, 197, 118], [147, 22, 109, 139], [11, 161, 135, 119], [26, 48, 199, 182], [96, 100, 82, 87], [149, 2, 8, 10], [5, 38, 166, 100], [193, 117, 59, 164], [133, 5, 38, 163], [88, 177, 84, 114], [9, 132, 177, 24], [94, 130, 83, 131], [77, 11, 141, 81], [154, 198, 175, 98], [21, 148, 170, 122], [185, 145, 101, 183], [100, 196, 111, 11], [97, 147, 112, 11], [25, 97, 95, 45], [6, 89, 88, 38], [51, 16, 151, 3], [90, 174, 122, 157], [2, 133, 121, 199], [15, 78, 163, 180], [103, 118, 7, 179], [102, 179, 157, 183], [113, 139, 195, 122], [55, 88, 68, 117], [115, 185, 93, 102], [139, 82, 3, 165], [135, 29, 78, 11], [11, 16, 60, 123], [103, 191, 187, 129], [146, 181, 28, 192], [85, 73, 136, 139], [117, 179, 81, 183], [15, 131, 106, 28], [58, 78, 111, 65], [76, 11, 25, 103], [11, 90, 162, 129], [144, 1, 16, 33], [33, 172, 40, 72], [106, 83, 160, 151], [68, 159, 150, 64], [31, 79, 83, 15], [51, 140, 173, 10], [105, 80, 70, 21], [195, 80, 64, 129], [50, 96, 107, 82], [185, 150, 15, 143], [28, 71, 27, 57], [58, 13, 146, 78], [20, 71, 183, 44], [91, 44, 15, 87], [77, 157, 95, 110], [132, 28, 193, 49], [177, 87, 57, 41], [194, 175, 17, 20], [166, 64, 134, 150], [79, 74, 162, 168], [166, 149, 34, 117], [160, 170, 127, 44], [99, 41, 103, 155], [48, 127, 138, 68], [17, 3, 101, 94], [29, 102, 123, 158], [194, 60, 135, 179], [73, 192, 145, 168], [21, 94, 154, 143], [17, 10, 145, 131], [73, 29, 195, 199], [132, 189, 90, 100], [134, 32, 81, 119], [118, 37, 119, 27], [51, 78, 187, 86], [95, 8, 56, 29], [156, 162, 186, 127], [126, 111, 144, 59], [7, 140, 32, 75], [40, 0, 109, 92], [165, 175, 61, 103], [178, 68, 185, 119], [132, 105, 36, 80], [165, 117, 35, 176], [128, 49, 185, 9], [50, 176, 12, 198], [124, 164, 99, 102], [36, 30, 114, 147], [166, 172, 35, 14]]
y = [1, 1, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0, 1, 0, 0, 1, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 0, 1, 1, 0, 1, 1, 0, 1, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0]

Sample Output:

[1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 1 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
1 0 1 1 1 1 0 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1]

import numpy as np

np.random.seed(2)

#independent variables

X = np.array(eval(input()))

#dependent variable

y = np.array(eval(input()))

m = X.shape[__] #no. of samples

n = X.shape[__] #no. of features

c = #no. of classes in the data and therefore no. of neurons in the layer

#weight vector of dimension (number of features, number of neurons in the layer)

w = np.random.randn(___, ___)

#bias vector of dimension (1, number of neurons in the layer)

b = np.zeros((___, ___))

#(weighted sum + bias) of dimension (number of samples, number of classes)

z = ____

#exponential transformation of z

a = np.exp(z)

#Perform the softmax on a

a = ____

#calculate the label for each observation

y_hat = ____

print(y_hat)

Ans:

import numpy as np

np.random.seed(2)

#independent variables

X = np.array(eval(input()))

#dependent variable

y = np.array(eval(input()))

‘m’ and ‘n’ refers to the no. of rows and columns in the dataset respectively.’c’ refers to the number of classes in y.

m = X.shape[0] #no. of samples

n = X.shape[1] #no. of features

c = len(np.unique(y)) #no. of classes in the data and therefore no. of neurons in the layer

Initializing weights randomly

#weight vector of dimension (number of features, number of neurons in the layer)

w = np.random.randn(n, c)

Initializing biases as zero

#bias vector of dimension (1, number of neurons in the layer)

b = np.zeros((1, c))

Finding the output ‘z’

#(weighted sum + bias) of dimension (number of samples, number of classes)

z = np.dot(X, w) + b

Applying the softmax activation function on the output

#exponential transformation of z

a = np.exp(z)

a = a/np.sum(a, axis = 1, keepdims = True)

Calculating the label for each observation

y_hat = np.argmax(a, axis = 1)

print(y_hat)

Q4. Same layer still different output

Why do two neurons in the same layer produce different outputs even after using the same kind of function (i.e. w^T.x + b)?

Choose the correct answer from below:

A. Because the weights are not the same for the neurons.

B. Because the input for each neuron is different.

C. Because weights of all neurons are updated using different learning rates.

D. Because only biases (b) of all neurons are different, not the weights.

Ans: C

Correct option: Because the weights are not the same for the neurons.

Explanation :

The weights for each neuron in a layer are different. Thus the output of each neuron ( wT.x + b ) will be different

The input for each neuron in a layer is the same. In a fully connected network, each neuron in a particular layer gets inputs from each neuron in the previous layer.

There may be different learning rates for each model weight depending on the type of optimizer used, but that is not the reason for the neurons to give different outputs.

In a fully connected network, each neuron has two trainable parameters : a bias and a weight. The values of bias and weight for any two neurons in a layer need not be the same, since they keep changing during the model training.

Q5. Will he watch the movie?

We want to predict whether a user would watch a movie or not. Each movie has a certain number of features, each of which is explained in the image.

Now take the case of the movie Avatar having the features vector as [9,1,0,5]. According to an algorithm, these features are assigned the weights [0.8,0.2,0.5,0.4] and bias=-10. For a user X, predict whether he will watch the movie or not if the threshold value(θ) is 10?

Note: If the output of the neuron is greater than θ then the user will watch the movie otherwise not.

Choose the correct answer from below:

A. Yes, the user will watch the movie with neuron output = -0.6

B. No, the user will not watch the movie with neuron output = -0.6

C. No, the user will watch the movie with neuron output = 2.5

D. Yes, the user will watch the movie with neuron output = 2.5

Ans: B

Correct option :No, the user will not watch the movie with neuron output with -0.6

Explanation :

The output of a neuron is obtained by taking the weighted sum of inputs and adding the bias term to it.

The output of neuron is :
(0.8).(9)+(0.2).(1)+(0).(0.5)+(5).(0.4)−10 =−0.6<10,
therefore he won’t watch the movie.

Q6. And perceptron

We want to design a perception that performs AND operation. Refer to the table below:-

For this, first, a weighted sum is calculated: z=w1x1+w2x2+b

Here, w1 and w2 are weights, and b is the bias for the neuron.

The activation function applied on this is as follows:-

f(z)=0, if z<0

f(z)=1, otherwise

Which of the following values of weights and bias will give the desired results?

Choose the correct answer from below:

A. W1=1, w2=1,b=-2

B. W1=1, w2=1,b=2

C. W1=1,w2=2,b=-2

D. W1=2,w2=1,b=4

Ans: A

Correct answer: w1=1,w2=1,b=−2

Explanation:

For the input (0,0) the perceptron will perform calculation something like this:
z=w1x1+x2x2+b=(1)(0)+(1)(0)+(−2)=−2
Therefore f(z)=0
Similarly for the inputs (0,1) and (1,0)
For both cases, z=−1,
Therefore, f(z)=0
But for the input (1,1) the value of z=w1.x1+w2.x2+b=0
Therefore, f(z) = 1
Therefore, among the options, 1,1, and -2 give the required results.
We can use any values for w1,w2, and b which satisfy our conditions and output.

Q7. Vectorize

Consider the following code snippet:

How do you vectorize this?

Note: All x, y and z are NumPy arrays.

Choose the correct answer from below:

A. z = x + y

B. z= x * y.T

C. z = x + y.T

D. z = x.T + y.T

Ans: C

orrect option : z=x+y.T

Explanation :

The shape of x is (10,6)
The shape of y is (6,1)

We can observe from the question that all elements of y are added element-wise to z, while y[j] is added to each element in the jth column in z.
For this to happen, we will have to convert the y array to shape (1,6) and then add it to x, which will result in broadcasting of y into shape (10,6) .

Thus the answer is z=x+y.T

Neural Network 1

Q1. Weights impact

For the neural network shown above, which of these statements is true?

Choose the correct answer from below:

A. -5 weight is bad for the neural network.

B. The neuron with weight 10 will have the most impact on the output.

C. The neuron with weight -5 will have the most impact on the output.

D. The neuron with weight 2 will have the most impact on the output.

Ans: B

Correct option : The neuron with weight 10 will have the most impact on the output.

Explanation :
There is no such thing that a neuron with a negative weight will be bad for the output. The negative or positive weight of a neuron simply means whether it has an increasing or decreasing effect on the output value. A neuron with the largest magnitude will have the most significant effect on the output value.

Q2. Calculate Forward Pass

The neuron n has the weights 1,2,3,4, and 5. The values of inputs are 4,10,5,20, and 0. We are using a linear activation function with the constant of proportionality being equal to 2 here.

The output will be:

Choose the correct answer from below:

A. 193.5

B. 59.5

C. 238

D. 119

Ans: C

Correct option : 238

Explanation :

Multiplying weights with their corresponging inputs, and then adding everything together:

-> Output=(1∗4+2∗10+3∗5+4∗20+5∗0)=119

But since the output is linear with a proportionality constant 2, hence :

-> Finaloutput=2∗119=238

Q3. Need for NN

Are classic ML Algorithms not powerful enough? Why exactly do we need to use Neural Networks?

Check all that apply.

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Unlike Classic ML Algos, NN does not require us to do manual feature engineering.

B. NNs can work with both structured and unstructured data

C. For a large dataset, classic ML Algos can outperform NNs

D. NNs are able to work better with sparse data

Ans: A,B,D

Correct Options:-

Unlike Classic ML Algos, NN does not require us to do manual feature engineering.
NNs can work with both structured and unstructured data
NNs are able to work better with sparse data

Explanation:-

In order to create a complex decision boundary, classic ML Algorithms require us to do heavy feature engineering, manually.
On the other hand, NN are able to find complex relations between features on their own.
NN are excellent in working with unstructured data (image / text / audio data), whereas classic ML algos are unable to handle this data.
The performance of NN might be comparable to that of classic ML Algos for small datasets.
However, if given a big enough dataset, NNs will always give better performance.

Q4. Scale drives DL

Refer to the given plot.

Which of the following generally does not hurts an algorithm's performance, and may help significantly?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Decreasing the size of a NN

B. Increasing the size of a NN

C. Decreasing the training set size

D. Increasing the training set size

Ans: B, D

Correct Options:-

Increasing the size of a NN
Increasing the training set size

Explanation:-

According to the trends in the given figure, big networks usually performs better than small networks.
Also, bringing more data to a NN model is almost always beneficial.

Q5. Factors of DL performance

Which of the following factors can help achieve high performance with Deep Learning algorithms?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Large amount of data

B. Smaller models

C. Better designed features to use

D. Large models

Ans: A, D

Correct Options:-

Large amount of data
Large models

Explanation:-

Over the last 20 years, we have accumulated a lot of data. Traditional algorithms were not able to benefit from this. Whereas, this large amount of data has been the fundamental reason why DL took off in the past decade.
In order to take advantage of this large amount of data available to us, we need a big enough model also.
Smaller models will not be able to yeild very high performace, as they will not be able to take advantage of the large amount of data.
One main difference between classical ML algos and DL algos is that DL models are able to “figure out” the best features using hidden layers.

Q6. NN true false

Mark the following statement as true or false:-

"Neural networks are good at figuring out functions, relating an input x to an output y, given enough examples."

Choose the correct answer from below:

A. True

B. False

Ans: A

Correct Option: True

Explanation:

With NN, we don’t need to design features by ourselves.
The NN figures out the necessary relations given enough data.

Machine Learning - Deep Learning

Neural network 4

Neural Network 2

Neural Network 1

About Machine Learning

SOFTWARE ENGINEERING