Ensemble
Learning MCQs
 
1.     The model which consists of a
group of predictors is called a
 -  Group
 
 -  Entity
 
 -  Ensemble
 
 -  Set
 
Ans: C
2.    
A Random forest is an ensemble of Decision Trees
 -  True
 
 -  False
 
Ans: A
3.    
The steps involved in deciding the output of a Random Forest are
 - Obtain the predictions of all individual trees
 
 - Predict the class that gets the most votes
 
 - Both of the above
 
 - None
 
Ans: C
4.    
A hard voting classifier takes into consideration
 -  The probabilities of output from
     each classifier
 
 -  The majority votes from the
     classifiers
 
 -  The mean of the output from each
     classifier
 
 -  The sum of the output from each
     classifier
 
Ans: B
5. If each classifier is a weak learner, the ensemble can still be a strong
learner?
 -  True
 
 -  False
 
Ans: ABottom of FormBottom of Form
6. Ensemble methods work best when the predictors are
 -  Sufficiently diverse
 
 -  As independent from one another as possible
 
 -  Making very different types of errors
 
 -  All of the above
 
Ans: D
7. To get
diverse classifiers we cannot train them using different algorithms
 -  True
 
 -  False
 
Ans: B
8. Training
the classifiers in an ensemble using very different algorithms increases the
chance that they will make very different types of errors, improving the
ensemble’s accuracy
 -  True
 
 -  False
 
Ans: A
9. When we consider only the majority of the outputs
from the classifiers then it is called
 - Hard Voting
 
 - Soft Voting
 
 - Both
 
 - None
 
Ans: A
10. Soft voting takes into consideration
 - The majority of votes from the classifiers
 
 - The highest class probability averaged over all the individual
     classifiers
 
 - Both
 
 - None of the above
 
Ans: B
11. In soft voting, the predicted class is the class
with the highest class probability, averaged over all the individual
classifiers
 -  True
 
 -  False
 
Ans: A
12. Soft voting achieves higher performance than hard
voting because
 -  Majority votes classifications are often wrong
 
 -  It gives more weight to highly confident votes
 
 -  Finding majority is computationally expensive
 
 -  This statement is false
 
Ans: B
13. The parameter which decides the voting method in
a VotingClassifier is
 -  method
 
 -  strategy
 
 -  voting
 
 -  Type
 
Ans: C
14. The parameter which holds the list of classifiers
which are to be used in the voting classifier is
 -  predictors
 
 -  classifiers
 
 -  estimators
 
 -  Models
 
Ans: C
15. One way to get a diverse set of classifiers is to
use the same training algorithm for every predictor, but to train them on
different random subsets of the training set
 -  True
 
 -  False
 
Ans: A
16. When sampling is performed with replacement, the
method is
A.     Bagging
B.     Pasting
C.     Both
D.     None
Ans: A
17. When sampling is performed without replacement,
it is called
 - Pasting
 
 - Bagging
 
 - Both
 
 - None
 
Ans: A
18. Both bagging and pasting allow training instances
to be sampled several times across multiple predictors, but only bagging allows
training instances to be sampled several times for the same predictor
 -  True
 
 -  False
 
Ans: A
19. In
bagging/pasting training set sampling and training, predictors can all be
trained in parallel, via different CPU cores or even different servers
 -  True
 
 -  False
 
Ans: A
20. To use
the bagging method, the value of the bootstrap parameter in the
BaggingClassifier should be set to
 -  True
 
 -  False
 
Ans: A
21. To use the pasting method, the value of the
bootstrap parameter in the BaggingClassifier should be set to
A.   
True
B.   
False
Ans: B
22. Overall, bagging often results in better models
 -  True
 
 -  False
 
Ans: A
23. How many training instances with replacement does
the BaggingClassifier train if the size of the training set is m?
 -  m/2
 
 -  m/3
 
 -  m
 
 -  m-n where n is the number of features
 
Ans: C
24. With bagging, it is not possible that some
instances are never sampled
 -  True
 
 -  False
 
Ans: B
25. Features can also be sampled in the
BaggingClassifier
 -  True
 
 -  False
 
Ans: A
26. The hyperparameters which control the feature
sampling are
 -  max_samples and bootstrap
 
 -  max_features and bootstrap_features
 
 - Both
 
 - None 
 
Ans: B
27. Sampling both training instances and features is
called the
 - Random Patches method
 
 - Random Subspaces method
 
 - Both
 
 - None
 
Ans: A
28. Keeping all training instances (i.e.,
bootstrap=False and max_samples=1.0) but sampling features (i.e.,
bootstrap_features=True and/or max_features smaller than 1.0) is called the
 - Random Patches method
 
 - Random Subspaces method
 
 - Both
 
 - None
 
Ans: B
29. Random
forest is an ensemble of Decision Trees generally trained via ______
 - Bagging
 
 - Pasting
 
 - Both
 
 - None
 
Ans: A
30. We can make the trees of a Random Forest even
more random by using random thresholds for each feature rather than searching
for the best possible thresholds?
 -  No
 
 -  Yes, and these are called Extremely
     Randomised Trees ensemble
 
Ans: B
31. If we look at a single Decision Tree, important
features are likely to appear closer to
 - Leaf of the tree
 
 - Middle of the tree
 
 - Root of the tree
 
 - None of these
 
Ans: C
32. Feature importances are available via the
feature_importances_ method of the RandomForestClassifier object.
 -  True
 
 -  False
 
Ans: A
33. The general idea of most boosting methods is to
train predictors sequentially, each trying to correct its predecessor.
 -  True
 
 -  False
 
Ans: A
34. One of the drawbacks of AdaBoost classifier is
that
 -  It is slow
 
 -  It cannot be parallelized
 
 -  It cannot be performed on larger training sets
 
 -  It requires a lot of memory and processing power
 
Ans: B
35. A
Decision Stump is a Decision Tree with
 - More than two leaf nodes
 
 - Max depth of 1, i.e. single decision node with two leaf nodes
 
 - Having more than 2 decision nodes
 
 - None
 
Ans: B
36. In
Gradient Boosting, instead of tweaking the instance weights at every iteration
like AdaBoost does, it tries to fit the new predictor to the residual errors
made by the previous predictor.
 -  True
 
 -  False
 
Ans: A
37. The learning_rate hyperparameter of
GradientBoostingRegressor scales the contribution of each tree?
 -  True
 
 -  False
 
Ans: A
38. The
ensemble method in which we train a model to perform the aggregation of outputs
from all the predictors is called
 -  Boosting
 
 -  Bagging
 
 -  Stacking
 
 -  Pasting
 
Ans: C