Machine Learning - Deep Learning

Decision Tree MCQs

1. Decision Trees can be used for

A. Classification Tasks

B. Regression Tasks

C. Multi-output tasks

D. All of the above

Ans: D

2. The iris dataset has

A. 5 features and 3 classes

B. 4 features and 3 classes

C. 2 features and 3 classes

D. 4 features and 2 classes

Ans: B

3. A node’s value attribute tells you how many training instances of each class this node applies to

Top of Form

True
False

Ans: A

4. A node’s gini attribute measures

Top of Form

The number of training instances in the node
The ratio of training instances in the node
Its impurity
None of these

Ans: C

5. If all the training instances of a node belong to the same class then the value of the node's Gini attribute will be

1
0
Any integer between 0 and 1
A negative value

Ans: B

6. A Gini coefficient of 1 expresses maximal inequality among the training samples

Top of Form

A. True

B. False

Ans: A

7. Gini index for a node is found by subtracting the sum of the square of ratio of each classes in a node from 1

Top of Form

True
False

Ans: A

8. A white box model’s decision are

Usually hard to explain in simple terms
Fairly intuitive and easy to interpret
Both
None

Ans: B

9. A black box model’s decision are

Usually hard to explain in simple terms
Fairly intuitive and easy to interpret
Both
None

Ans: A

10. Random Forests and Neural networks are examples of

Top of Form

White Box Model
Black Box Model
Both
None

Ans: B

11. Decision Trees are examples of

White Box Model
Black Box Model
Both
None

Ans: A

12. A decision tree estimates the probability that an instance belongs to a particular class k by finding the corresponding leaf node for the instance and then returning the ratio of training instances of class k

Top of Form

True
False

Ans: A

13. The Decision Tree classifier predicts the class which has the highest probability

Top of Form

True
False

Ans: A

14. If the output of predict_proba is array([[ 0. , 0.90740741, 0.09259259]]) then the predicted class will be

Class 0
Class 1
Class 2
None

Ans: B

15. The CART algorithm splits the training set in two subsets

Using all the features and a threshold tk
Using a single feature k and a threshold tk
Using half of the features and a threshold k
None

Ans: B

16. How does the CART algorithm chooses the feature k and the threshold tk for splitting ?

It randomly chooses a feature k
It chooses the mean of the values of the feature k as threshold
It chooses the feature k and threshold tk which produces the purest subsets
It chooses the feature k and threshold tk such that the gini index value of the subsets is 0

Ans: C

17. The cost function for finding the value of feature k and threshold tk takes into consideration

The Gini index values of the subsets
The number of instances in the subsets
The total number of instances in the node that is being split
All of these

Ans: D

18. Once the CART algorithm has successfully split the training set in two

Top of Form

It stops splitting further
It splits the subsets using the same logic, then the sub- subsets and so on, recursively
It splits only the right subset
It splits only the left subset

Ans: B

19. The CART algorithm stops recursion once it reaches the maximum depth (defined by the max_depth hyperparameter), or if it cannot find a split that will reduce impurity

Top of Form

True
False

Ans: A

20. Which of the following are correct for the CART algorithm

It is a greedy algorithm
It greedily searches for an optimum split at each level
It does not check whether or not the split will lead to the lowest possible impurity several levels down
All of the above are correct

Ans: D

21. While making a prediction in Decision Tree, each node only requires checking the value of one feature

True
False

Ans: A

22. If the total number of training instances is m then the overall complexity of prediction of decision trees is

Top of Form

O(m)
O(mlog(m))
O(log(m))
O(m/log(m))

Ans: C

23. The training algorithm of decision tree compares all features (or less if max_features is set) on all samples at each node

Top of Form

True
False

Ans: A

24. If the number of features in n and number of training set instances is m then the complexity of training of a decision tree is

O(nmlog(m))
O(mlog(n))
O(nlog(m))
O(mn)

Ans: A

25. Gini impurity is slightly faster to compute in comparison to entropy

Top of Form

True
False

Ans: A

26. Models like Decision Tree models are often called nonparametric model because

Top of Form

They do not have any parameters
The number of parameters is not determined prior to training
They have lesser parameters as compared to other models
They are easy to interpret and understand

Ans: B

27. Which of the following is not a regularization parameter for decision tree classifier

Top of Form

max_depth
min_samples_leaf
max_features
min_leaf_nodes

Ans: D

28. Increasing min_* hyperparameters or reducing max_* hyperparameters will regularize the model

Top of Form

True
False

Ans: A

29. For regression tasks the CART algorithm tries to split the training set in a way that minimizes the MSE

Top of Form

True
False

Ans: A

30. All the splits made by a Decision Tree are

Never perpendicular to an axis
Always perpendicular to an axis
Always at an acute angle to an axis
Always at an obtuse angle to an axis

Ans: BBottom of Form

Bottom of Form

Machine Learning - Support Vector Machines (SVM) - MCQs

1. A Support Vector Machine can be used for

A. Performing linear or nonlinear classification

B. Performing regression

C. For outlier detection

D. All of the above

Ans: D

2. The decision boundaries in a Support Vector machine is fully determined (or “supported”) by the instances located on the edge of the street?

Top of Form

True
False

Ans: A

3. Support Vector Machines are not sensitive to feature scaling

True
False

Ans: B

4. If we strictly impose that all instances be off the street and on the right side, this is called

Soft margin classification
Hard margin classification
Strict margin classification
Loose margin classification

Ans: B

5. The main issues with hard margin classification are

It only works if the data is linearly separable
It is quite sensitive to outliers
It is impossible to find a margin if the data is not linearly separable
All of the above

Ans: D

6. The objectives of Soft Margin Classification are to find a good balance between

Keeping the street as large as possible
Limiting the margin violations
Both of the above
None of the above

Ans: C

7. The balance between keeping the street as large as possible and limiting margin violations is controlled by this hyperparameter

Tol
Loss
Penalty
C

Ans: D

8. A smaller C value leads to a wider street but more margin violations.

Top of Form

True
False

Ans: A

9. If your SVM model is overfitting, you can try regularizing it by reducing the value of

Top of Form

Tol
C hyperparameter
intercept_scaling
None of the above

Ans: B

10. Problems with adding polynomial features are

At a low polynomial degree, it cannot deal with very complex datasets
With a high polynomial degree, it creates a huge number of features
Adding high polynomial degree makes the model too slow
All of the above

Ans: D

11. The hyperparameter coef0 of SVC controls how much the model is influenced by high-degree polynomials versus low-degree polynomials

Top of Form

A. True

B. False

Ans: A

12. A similarity function like Gaussian Radial Basis Function is used to

A. Measure how many features are related to each other

B. Find the most important features

C. Find the relationship between different features

D. Measure how much each instance resembles a particular landmark

Ans: D

13. When adding features with similarity function, and creating a landmark at the location of each and every instance in the training set, a training set with m instances and n features gets transformed to (assuming you drop the original features)

A training set with n instances and n features
A training set with m/2 instances and n/2 features
A training set with m instances and m features
A training set with m instances and n features

Ans: C

14. When using SVMs we can apply an almost miraculous mathematical technique for adding polynomial features and similarity features called the

Top of Form

Kernel trick
Shell trick
Mapping and Reducing
None of the above

Ans: A

15. Which is right for the gamma parameter of SVC which acts as a regularization hyperparameter

If model is overfitting, increase it, if it is underfitting, reduce it
If model is overfitting, reduce it, if it is underfitting, increase it
If model is overfitting, keep it same
If it is underfitting, keep it same

Ans: B

16. LinearSVC is much faster than SVC(kernel="linear"))

True
False

Ans: A

17. In SVM regression the model tries to

Fit the largest possible street between two classes while limiting margin violations
Fit as many instances as possible on the street while limiting margin violations
Both
None of the above

Ans: B

18. The SVR class is the regression equivalent of the SVC class, and the LinearSVR class is the regression equivalent of the LinearSVC class

True
False

Ans: ABottom of Form

Bottom of Form

Machine Learning - Deep Learning

Decision Tree MCQs

Decision Tree MCQs

Machine Learning - Support Vector Machines (SVM) - MCQs

Machine Learning - Support Vector Machines (SVM) - MCQs

About Machine Learning

SOFTWARE ENGINEERING