Machine Learning - Deep Learning

Convolutional Neural Network 2

Q1. Sparse Connection

What does sparsity of connections mean as a benefit of using convolutional layers?

Choose the correct answer from below:

A. Each filter is connected to every channel in the previous layer

B. Each layer in a convolutional network is connected only to two other layers

C. Each activation in the next layer depends on only a small number of activations from the previous layer

D. Regularization causes gradient descent to set many of the parameters to zero

Ans: C

Correct answer: Each activation in the next layer depends on only a small number of activations from the previous layer.

Reason:

In neural network usage, “dense” connections connect all inputs.

By contrast, a CNN is “sparse” because only the local “patch” of pixels is connected, instead using all pixels as an input.

High correlation can be found between the sparseness of the output of different layers, which makes CNN better than traditional Neural networks.

Due to this nature of the CNN, Each activation in the next layer depends on only a small number of activations from the previous layer.

Q2. Data size

As you train your model, you realize that you do not have enough data. Which of the following data augmentation techniques can be used to overcome the shortage of data?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Adding Noise

B. Rotation

C. Translation

D. Color Augmentation

Ans: A, B, C, D

The correct answers are:

Adding Noise
Rotation
Translation
Color Augmentation.

Reason:

Image augmentation is a process of creating new training examples from the existing ones.

Some data augmentation techniques that can be used to overcome the shortage of data are Adding Noise, Rotation, Translation, and Color Augmentation.

Adding noise to the data aims to improve the generalization performance.

Random rotation augmentation will randomly rotate the images from 0 to 360 degrees in clock wise direction.

Translation just involves moving the image along the X or Y direction (or both).

Color Augmentation alters the intensities of the RGB channels along the natural variations of the images.

Q3. Accuracy After DA

Is it possible for the training data Accuracy to be lower than testing Data after the use of data Augmentation?

Choose the correct answer from below:

A. True

B. False

Ans: A

Correct answer: True

Reason:

The training accuracy could be lowered because we've made it artificially harder for the network to give the right answers, due to all the different augmentation techniques used, which makes the model robust.

However, during testing because of this robust nature, we can get higher accuracy than training data.

Q4. fruit augment

We are making a CNN model that classifies 5 different fruits. The distribution of number of image are as follows:

Banana—20 images

Apple—30 images

Mango—200 images

Watermelon—400 images

Peaches—400 images

Which of the given fruits should undergo augmentation in order to avoid class imbalance in the dataset?

Choose the correct answer from below:

A. Banana, Apple

B. Banana, Apple, Mango

C. Watermelon, Peaches

D. All the Fruits

Ans: B

Correct answer: Banana, Apple, Mango

Image augmentation is a process of creating new training examples from the existing ones.

Imbalanced classification is the problem of classification when there is an unequal distribution of classes in the training dataset.

In the given question, the number of images is low in Banana, Apple, and Mango compared to watermelon and peaches. Hence, we have to use augmentation on them.

Q5. CNN Select Again

Which among the following is False:

Choose the correct answer from below:

A. Dilated convolution increases the receptive field size when compared to standard convolution operator

B. Dropout is a regularization technique

C. Batch normalization ensures that the weight of each of the hidden layer of a deep network is normalized

D. Convolution neural networks are translation invariant

Ans: C

Correct answer: Batch normalization ensures that the weight of each of the hidden layers of a deep network is normalized

Reason:

Compared to the standard convolution operator, the dilated convolution can first capture intrinsic sequence information by expanding the field of the convolution kernel without increasing the parameter amount of the model.

Dropout regularization is a technique to prevent neural networks from overfitting.

Batch normalization is a technique for training very deep neural networks that standardizes the inputs to a layer for each mini-batch, NOT each layer.

CNN is translation invariant, the position of the object in the image should not be fixed for it to be detected by the CNN.

Q6. Reducing Parameters two methods

Which of the following are the methods for tackling overfitting?

Choose the correct answer from below, please note that this question may have multiple correct answers

A. Improving Network Configuration to increase parameters

B. Augmenting Dataset to decrease the number of samples

C. Augmenting Dataset to increase the number of samples

D. Improving Network Configuration to optimise parameters

Ans: C, D

Correct Answers:

Augmenting Dataset to increase the number of samples
Improving Network Configuration to optimise parameters

Explanation:

Over-parameterization of the network, makes it prone to overfitting. One way to tackle this would be to remove some layers from the network.
Augmenting the Dataset leads to a greater diversity of data samples being seen by the network, hence decreasing the likelihood of overfitting the model on the training dataset.

Q7. Underfitting vs Overfitting

The given chart below shows, the training data accuracy vs validation data accuracy for a CNN model with a task of classification for 5 classes.

What is the problem with the model and how to solve the problem, if any?

Choose the correct answer from below:

A. Overfitting, adding More Conv2d layers

B. Underfitting, More epochs

C. Overfitting, Regularization

D. No problem

Ans: B

Correct Answer: Underfitting, More epochs

Explanation:

After looking at the plot, It seems the model was still improving when we stopped training, leading to underfitting!
An underfit model doesn’t fully learn each and every example in the dataset. In such cases, we see a low score on both the training set and test/validation set.
If we add more epochs, the model will learn more and could solve the underfitting problem.

Q8. Data augmentation effectiveness

Suppose you wish to train a neural network to locate lions anywhere in the images, and you use a training dataset that has images similar to the ones shown above. In this case, if we apply the data augmentation techniques, it will be ______ as there is _______ in the training data.

Choose the correct answer from below:

A. effective, position bias

B. ineffective, angle bias

C. ineffective, position bias

D. effective, size bias

Ans: A

The correct answer is: effective, position bias.

Reason:

In this dataset, we can see position bias in images, as the lions are positioned at the center of every image.
Hence, every image is similar, and in this case, applying data augmentation techniques like width shift, and height shift may improve the performance of the network.

Q9. EarlyStopping

Which of the following statement is the best description of early stopping?

Choose the correct answer from below:

A. Train the network until a local minimum in the error function is reached

B. Simulate the network on a validation dataset after every epoch of training. Stop the training when the generalization error starts to increase.

C. Add a momentum term to the weight update in the Generalized Delta Rule

D. A faster version of backpropagation

Ans: B

Correct Answer: Simulate the network on a validation dataset after every epoch of training. Stop the training when the generalization error starts to increase.

Explanation:

During training, the model is evaluated on a holdout validation dataset after each epoch.
If the performance of the model on the validation dataset starts to degrade (Example: loss begins to increase or accuracy begins to decrease), then the training process is stopped.

Q10. EarlyStopping code

Fill the code, for setting early stopping in Tensorflow to monitor validation accuracy val_accuracy and to stop the training when there is no improvement after 2 epochs?

custom_early_stopping = EarlyStopping(
___________,
____________
)

Choose the correct answer from below:

A. monitoring=’val_accuracy’, min_delta=2

B. mode=’val_accuracy’, min_delta=2

C. monitor=’val_accuracy’, patience=2

D. monitoring=’val_accuracy’, patience=2

Ans: C

Correct Answer: monitor=’val_accuracy’, patience=2

Explanation:

monitor=’val_accuracy’ is used to monitor the performance of validation accuracy for every epoch.
patience=2 means the training is terminated as soon as 2 epochs with no improvement in the validation accuracy occur.
min_delta refers to minimum change in the validation accuracy to qualify as an improvement, i.e. an absolute change of less than min_delta, will count as no improvement.

Q11. tf dot image

How will you apply data augmentation to rotate the image 270^o counter-clockwise using tf.Image?

Choose the correct answer from below:

A. tf.image.rot(image)

B. tf.image.rot270(image)

C. tf.image.rot90(image, k=3)

D. tf.image.rot(image, k=3)

Ans: C

Correct Answer: tf.image.rot90(image, k=3)

Explanation:

For rotating an image or a batch of images counter-clockwise by multiples of 90 degrees, you can use tf.image.rot90(image, k=3).
k denotes the number of 90 degrees rotations you want to make.

Convolutional Neural Network 1

Q1. CNN features

Why is convolution neural network taking off quickly in recent times?

Choose the correct answer from below:

A.     Access to large amount of digitized data
B.     Integration of feature extraction within the training process
C.      Availability of more computational power
D.     All the above

Ans:

All the above is the correct answer.
Using CNN, we can Access and train our model on a large amount of digitized data
Unlike classical image D recognition where you define the image features yourself, CNN takes the image’s raw pixel data, trains the model, then extracts the features automatically for better classification.
Using CNN, the number of training parameters is reduced significantly. And due to the availability of more computational power in recent times. The model takes less time to train.

Q2. Recognizing a cat

For an image recognition problem (recognizing a cat in a photo), which of the following architecture of neural network would be best suited to solve the problem?

Choose the correct answer from below:

A.     Multi Layer Perceptron
B.     Convolutional Neural Network
C.      Perceptron
D.     Support Vector Machine

Ans: B

The correct answer is Convolutional Neural Network.

The Convolutional Neural Network (CNN or ConvNet) is a subtype of the Neural Networks that is mainly used for applications in image and speech recognition. Its built-in convolutional layer reduces the high dimensionality of images without losing its information. That is why CNNs are especially suited for this use case.

Q3. CNN Layers

Which of the following statements is False?

Choose the correct answer from below:

A.     CNN's are prone to overfitting because of less number of parameters
B.     There are no learnable parameters in Pooling layers
C.      In a max-pooling layer, the unit that contributes(maximum entry) in the forward propagation gets all the gradient in the backpropagation
D.     None of the above

Ans: A

Correct option: CNNs are prone to overfitting because of less number of parameters

Explanation :

The statement "CNNs are prone to overfitting because of less number of parameters" is false. CNN's are prone to overfitting when they have a lot of parameters. A neural network with a lot of parameters tries to learn too much or too many details in the training data along with the noise from the training data, which results in poor performance on unseen or test datasets, which is termed overfitting.

There are no trainable parameters in a max-pooling layer. In the forward pass, it passes the maximum value within each filter to the next layer. In the backward pass, it propagates error in the next layer to the place from where the max value is taken, because that's where the error comes from. You can use this link to learn more about max pooling layer.

In a max-pooling layer, the unit that gets contributed(maximum entry) in the forward propagation gets all the gradients in the backpropagation.( This is True )

Q4. Max-Pooling necessary

Why do we use Max-pooling in Convolutional Neural Networks ?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     Reduce Resolution
B.     Extract the High intensity features
C.      Extract the low intensity features
D.     Increase Resolution

Ans: A, C

The correct answers are:

Reduce Resolution
Extract the High intensity features

Reason:

Max-pooling helps in extracting high intensity features.
While Avg-pooling goes for smooth features.
If time constraint is not a problem, then one can skip the pooling layer and use a convolutional layer to do the same.
It also helps in reducing the resolution of the input.

Q5. Pixel

A Pixel means a Picture Element. It is the smallest Element of an image on a computer display. Given two different images (pixel grids, where cells have the value of pixels) of size 5×5, find out the type of image1 and image2 respectively.

Choose the correct answer from below:

A.     image1= Black and White, image2= color
B.     image1= color, image2= Black and White
C.      image1= Grayscale, image2= color
D.     image1= Black and White, image2= Grayscale

Ans: D

Correct answer is image1= Black and White, image2= Grayscale
For a binary image (Black and White), a pixel can only take a value of 0 or 255
In a GrayScale image, it can choose values between 0 and 255.

Q6. Translation in-variance

Determine whether the given statement is true or false.

When a pooling layer is added to a convolutional neural network, translation invariance is preserved.

Note: Translation in-variance means that the system produces the same response, regardless of how its input is shifted.

Choose the correct answer from below:

A. True
B. False

Ans: A

The correct answer is True

Reason:

Invariance means that we can recognize an object as an object, even when its appearance varies in some way. This is generally a good thing, because it preserves the object's identity, category, (etc.) across changes in the specifics of the visual input, like relative positions of the viewer/camera and the object.
Pooling helps make the representation approximately invariant to small translations of the input.
• If we translate the input by a small amount, the values of most of the outputs do not change.
• Pooling can be viewed as adding a strong prior that the function the layer learns must be invariant to small translations.

Q7. True About Type of Padding

Which of the following are True about Padding in CNN?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     We should use valid padding if we know that information at edges is not that much useful.
B.     There is no reduction in dimension when we use zero padding.
C.      In valid padding, we drop the part of the image where the filter does not fit.

Ans: A,B,C

The correct answers are:

We should use valid padding if we know that information at edges is not that much useful.
There is no reduction in dimension when we use zero padding.
In valid padding, we drop the part of the image where the filter does not fit.

Reason:

The output size of the convolutional layer shrinks depending on the input size & kernel size.
In zero padding, we pad zeros around the image's border to save most of the information, whereas, in valid padding, we lose out on the information that doesn't fit in filters.
There is no reduction in dimension when we use zero padding.
To sum up, Valid padding means no padding. The output size of the convolutional layer shrinks depending on the input size & kernel size. On the contrary, 'zero' padding means using padding.

Q8. CNN with benefits

What are the benefits of using Convolutional Neural Network(CNN) instead of Artificial Neural Network(ANN)?

Choose the correct answer from below, please note that this question may have multiple correct answers

A.     Reduce the number of units in the network, which means fewer parameters to learn and decreased computational power is required
B.     Increase the number of units in the network, which means more parameters to learn and increase chance of overfitting.
C.      They consider the context information in the small neighborhoods.
D.     CNN uses weight sharing technique

Ans: A, C,D

Correct options:

Reduce the number of units in the network, which means fewer parameters to learn and decreased computational power is required
They consider the context information in the small neighborhoods
CNN uses weight sharing technique.

Explanation :

CNNs usually have a lesser no of parameters compared to ANNs, which means

CNNs consider the context information and pixel dependencies in the small neighborhood and due to this feature, they achieve a better prediction in data like images

Weight sharing decreases the number of parameters and also makes feature search insensitive to feature location in the image. This results in a more generalized model and thus also works as a regularization technique .

Q9. Appyling Max pooling

If we pass a 2×2 max-pooling filter over the given input with a stride of 2, find the value of W, X, Y, Z?

Choose the correct answer from below:

A.     W = 8, X = 6, Y= 9, Z=6
B.     W = 9, X = 8, Y= 8, Z=6
C.      W = 6, X = 9, Y= 8, Z=8
D.     W = 9, X = 8, Y= 8, Z=9

Ans: B

The correct answer is W = 9, X = 8, Y= 8, Z=6

Our first 2 × 2 region is highlighted in yellow, and we can see the max value of this region is 6.
Next 2 × 2 region is highlighted in blue, and we can see the max value of this region is 9.
Similarly, we will do this for all the 2×2 sub-matrices highlighted in different colors.

Q10. Difference in output size

What is the difference between the output size of the given two models with input image of size 100×100. Given, number of filter, filter size, strides respectively in the figure ? (Take padding = 0)

Note: The Answer is the difference of final convolution of Model1 and Model2.

Example: Say the final convolution of Model1 is 10 x 10 x 30 = 3000 and Model2 is 20 x 20 x 14 = 5600
Answer = 5600 - 3000 = 2600

Choose the correct answer from below:

A. 1392

B. 1024

C. 6876

D. 500

Ans: B

The correct answer is 1024

The result size of a convolution after 1 layer will be (W – F + 2P) /S + 1.

For model 1,

Step1 - Input = 100 x 100, filter = 15, filter size = 3 x 3, strides = 1

Answer = (100 - 3 + (2x0))/1 + 1 = 98

Step1_output = 98 x 98 x 15

Step2 - Input = 98 x 98, filter = 42, filter size = 6 x 6, strides = 4

Answer = (98 - 6 + (2x0))/4 + 1 = 24

Step2_output = 24 x 24 x 42

Step3 - Input = 24 x 24, filter = 30, filter size = 3 x 3, strides = 3

Answer = (24 - 3 + (2x0))/3 + 1 = 8

Step3_output = 8 x 8 x 30

final_model1_ output = 1920

——————————————————————————

For model 2,

Step1 - Input = 100 x 100, filter = 5, filter size = 6 x 6, strides = 1

Answer = (100 - 6 + (2x0))/1 + 1 = 95

Step1_output = 95 x 95 x 5

Step2 - Input = 95 x 95, filter = 11, filter size = 3 x 3, strides = 4

Answer = (95 - 3 + (2x0))/4 + 1 = 24

Step2_output = 24 x 24 x 11

Step3 - Input = 24 x 24, filter = 14, filter size = 3 x 3, strides = 3

Answer = (24 - 3 + (2x0))/3 + 1 = 8

Step3_output = 8 x 8 x 14

final_model2_ output = 896

Therefore, difference in output size will be 1920 – 896 = 1024.

Q11. Horizontal Edges

Perform a default Horizontal edge detection on the given image and choose the correct option?

Note : Here Stride = 1, Padding = Valid

Choose the correct answer from below:

A. A

B. B

C. C

D. D

Ans: A

Therefore, correct option is A

Q12. Dimensionality Reduction

Jay is working on an image resizing algorithm. He wants to reduce the dimensions of an image, he takes inspiration from the course he took on Scaler related to Data Science where he was taught about CNN's. Which of these options might be useful in the dimensionality reduction of an image?

hoose the correct answer from below, please note that this question may have multiple correct answers

A. Convolution Layer

B. ReLU Layer

C. Sigmoid

D. Pooling Layer

Ans: A,D

Correct options:

Convolution Layer
Pooling Layer

Explanation :

Convolution Layer helps in dimensionality reduction as convolution layer can decrease the size of input depending upon size of kernel, stride etc.
Pooling layer also decreases size, like if we use Max Pooling, then it takes maximum value present in size of kernel matrix.
ReLU and sigmoid are just activations, they don't affect the shape of an image.

Neural Network 3

Q1. Complete the code

For the above code implementation of forward and backward propagation for the sigmoid function, complete the backward pass [????] to compute analytical gradients.

Note: grad in backward is actually the output error gradients.

Choose the correct answer from below:

A.     grad_input = self.sig * (1-self.sig) * grad
B.     grad_input = self.sig / (1-self.sig) * grad
C.      grad_input = self.sig / (1-self.sig) + grad
D.     grad_input = self.sig + (1-self.sig) - grad

Ans: A

Correct Answer : grad_input = self.sig * (1-self.sig) * grad

Explanation : The grad_input will be given by :

dZ = The error introduced by input Z.

dA = The error introduced by output A.

σ(x) · 1 − σ(x) = The derivative of the Sigmoid activation function.

where σ(x) represents the sigmoid function.

Q2. Trained Perceptron

A perceptron was trained to distinguish between two classes, "+1" and "-1". The result is shown in the plot given below. Which of the following might be the reason for poor performance of the trained perceptron?

Choose the correct answer from below:

A.     The perceptron can not separate linearly separated data
B.     The perceptron works only if the two classes are linearly separable which is not the case here.
C.      The smaller learning rate with less number of epochs of perceptron could have restricted it from producing good results.
D.     The "-1" class dominates the dataset, thereby pulling the decision boundary closer to itself.

Ans:C

Correct option: The smaller learning rate with less number of epochs of perceptron could have restricted it from producing good results.

Explanation:

The number of data in both classes is enough,but the difference between their numbers is not that significant that it can cause misclassification.
Since the dot product between weights “w” and input “x” is related linearly to x, the perceptron is a linear classifier. It is not capable of separating classes that are not linearly separable.
When observing the result, it can be classes seem to be linearly separable with few exceptions.However, for classes that are linearly separable, the algorithm is guaranteed to converge to the correct decision boundary.
Also, the decision boundary is not towards class -1 because of the majority. Both the classes seems to have fairly equal amount of samples for training a perceptron.
As we can see the model underfits the data, this means that the number of epochs for the model to train on is quite low or the learning rate is quite small, making the model perform poorly

Q3. Identify the Function

Mark the correct option for the below-mentioned statements:

(a) It is possible for a perceptron that it adds up all the weighted inputs it receives, and if the sum exceeds a specific value, it outputs a 1. Otherwise, it just outputs a 0.

(b) Both artificial and biological neural networks learn from past experiences.

Choose the correct answer from below:

A.     Both the mentioned statements are true.
B.     Both the mentioned statements are false.
C.      Only statement (a) is true.
D.     Only statement (b) is true.

Ans: A

Correct option: Both the statements are true.

Explanation :

Implementation of statement (a) is called step function and yes it is possible.

Both of artificial and biological neural networks learn from past experiences.
The artificial networks are trained on data to make predictions. The weights assigned to each neuron continuously
changes during the training process to reduce the error.

Q4. Find the Value of 'a'

Given below is a neural network with one neuron that takes two float numbers as inputs.

If the model uses the sigmoid activation function, What will be the value of 'a' for the given x1 and x2 _____(rounded off to 2 decimal places)?

Choose the correct answer from below:

A.     0.57
B.     0.22
C.      0.94
D.     0.75

Ans: A

Correct option :

0.57

Explanation :

The value of z will be :

z= w1.x1+w2.x2+b
z = (0.5×0.55) + (−0.35×0.45) + 0.15 = 0.2675

The value of a will be :

a= f(z) = σ(0.2675) = 1+e(−z)1=1.7652901=0.5664=0.57

Machine Learning - Deep Learning

Convolutional Neural Network 2

Convolutional Neural Network 1

Neural Network 3

About Machine Learning

SOFTWARE ENGINEERING