Machine Learning MCQs - 5 (Ensemble Models)

 Machine Learning MCQs - 5 
(Ensemble Models)

---------------------------------------------------------------------

1. The model which consists of a group of predictors is called a
  1. Group
  2. Entity
  3. Ensemble
  4. Set
Ans: 3

2. A Random forest is an ensemble of Decision Trees
  1. True
  2. False
Ans: 1

3. The steps involved in deciding the output of a Random Forest are
  1. Obtain the predictions of all individual trees
  2. Predict the class that gets the most votes
  3. Both of the above
Ans: 3

4. A hard voting classifier takes into consideration
  1. The probabilities of output from each classifier
  2. The majority votes from the classifiers
  3. The mean of the output from each classifier
  4. The sum of the output from each classifier
Ans: 2

5. If each classifier is a weak learner, the ensemble can still be a strong learner?
  1. True
  2. False
Ans: 1

6. Ensemble methods work best when the predictors are
  1. Sufficiently diverse
  2. As independent from one another as possible
  3. Making very different types of errors
  4. All of the above
Ans: 4

7. To get diverse classifiers we cannot train them using different algorithms
  1. True
  2. False
Ans: 2

8. Training the classifiers in an ensemble using very different algorithms increases the chance that they will make very different types of errors, improving the ensemble’s accuracy
  1. True
  2. False
Ans: 1

9. When we consider only the majority of the outputs from the classifiers then it is called
  1. Hard Voting
  2. Soft Voting
Ans: 1

10. Soft voting takes into consideration
  1. The majority of votes from the classifiers
  2. The highest class probability averaged over all the individual classifiers

Ans: 2

11. In soft voting, the predicted class is the class with the highest class probability, averaged over all the individual classifiers
  1. True
  2. False
Ans: 1

12. Soft voting achieves higher performance than hard voting because
  1. Majority votes classifications are often wrong
  2. It gives more weight to highly confident votes
  3. Finding majority is computationally expensive
  4. This statement is false
Ans: 2

13. When sampling is performed with replacement, the method is
  1. Bagging
  2. Pasting
Ans: 1

14. When sampling is performed without replacement, it is called
  1. Pasting
  2. Bagging
Ans: 1

15. Both bagging and pasting allow training instances to be sampled several times across multiple predictors, but only bagging allows training instances to be sampled several times for the same predictor
  1. True
  2. False
Ans: 1

16. In bagging/pasting training set sampling and training, predictors can all be trained in parallel, via different CPU cores or even different servers
  1. True
  2. False
Ans: 1

17. To use the bagging method, the value of the bootstrap parameter in the BaggingClassifier should be set to
  1. True
  2. False
Ans: 1

18. Overall, bagging often results in better models
  1. True
  2. False
Ans: 1

19. With bagging, it is not possible that some instances are never sampled
  1. True
  2. False
Ans: 2

20. Features can also be sampled in the BaggingClassifier
  1. True
  2. False
Ans: 1

21. The hyperparameters which control the feature sampling are
  1. max_samples and bootstrap
  2. max_features and bootstrap_features
Ans: 2

22. Random forest is an ensemble of Decision Trees generally trained via ______
  1. Bagging
  2. Pasting
Ans: 1

23. We can make the trees of a Random Forest even more random by using random thresholds for each feature rather than searching for the best possible thresholds?
  1. No
  2. Yes, and these are called Extremely Randomised Trees ensemble
Ans: 2

24. If we look at a single Decision Tree, important features are likely to appear closer to
  1. Leaf of the tree
  2. Middle of the tree
  3. Root of the tree
Ans: 3

25. One of the drawbacks of AdaBoost classifier is that
  1. It is slow
  2. It cannot be parallelized
  3. It cannot be performed on larger training sets
  4. It requires a lot of memory and processing power
Ans: 2

26. A Decision Stump is a Decision Tree with
  1. More than two leaf nodes
  2. Max depth of 1, i.e. single decision node with two leaf nodes
  3. Having more than 2 decision nodes
Ans: 2

27. In Gradient Boosting, instead of tweaking the instance weights at every iteration like AdaBoost does, it tries to fit the new predictor to the residual errors made by the previous predictor.
  1. True
  2. False
Ans: 1

28. The learning_rate hyperparameter of GradientBoostingRegressor scales the contribution of each tree ?
  1. True
  2. False
Ans: 1

29. The ensemble method in which we train a model to perform the aggregation of outputs from all the predictors is called
  1. Boosting
  2. Bagging
  3. Stacking
  4. Pasting
Ans: 3

Machine Learning MCQs - 4 (Clustering, Dimensionality Reduction)

 Machine Learning MCQs - 4 
(Clustering, Dimensionality Reduction)

---------------------------------------------------------------------

1. Which of the following is finally produced by Hierarchical Clustering?

  1. final estimate of cluster centroids
  2. tree showing how close things are to each other
  3. assignment of each point to clusters
  4. all of the mentioned

Ans: 2

2. Which of the following is required by K-means clustering?

  1. defined distance metric
  2. number of clusters
  3. initial guess as to cluster centroids
  4. all of the mentioned

Ans: 4

3. Point out the wrong statement.

  1. k-means clustering is a method of vector quantization
  2. k-means clustering aims to partition n observations into k clusters
  3. k-nearest neighbor is same as k-means
  4. none of the mentioned

Ans: 3

4. Which of the following combination is incorrect?

  1. Continuous – euclidean distance
  2. Continuous – correlation similarity
  3. Binary – manhattan distance
  4. None of the mentioned

Ans: 4

5. Hierarchical clustering should be primarily used for exploration

  1. True
  2. False

Ans: 1

6. Which of the following function is used for k-means clustering?

  1. k-means
  2. k-mean
  3. heatmap
  4. none of the mentioned

Ans: 1

7. Which of the following clustering requires merging approach?

  1. Partitional
  2. Hierarchical
  3. Naive Bayes
  4. None of the mentioned

Ans: 2

8. K-means is not deterministic and it also consists of number of iterations.

  1. True
  2. False

Ans: 1

9. Which of the following can act as possible termination conditions in K-Means?


  1. For a fixed number of iterations.
  2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
  3. Centroids do not change between successive iterations.
  4. Terminate when RSS falls below a threshold
options: 
  1.  1, 3 and 4
  2.  1, 2 and 3
  3.  1, 2 and 4
  4.  All of the above
Ans: 4


10. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

1.     K- Means clustering algorithm

2.     Agglomerative clustering algorithm

3.     Expectation-Maximization clustering algorithm

4.     Diverse clustering algorithm

options: 

  1.        1 only
  2.        2 and 3
  3.        2 and 4
  4.        1 and 3
Ans: 4

11. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset?
  1.       Proximity function used
  2.       of data points used
  3.        of variables used
  4.        All of the above
Ans: 4


12. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?





  1. 1
  2. 2
  3. 3
  4. 4
Ans: 2

13. In which of the following cases will K-Means clustering fail to give good results?

1.     Data points with outliers

2.     Data points with different densities

3.     Data points with round shapes

4.     Data points with non-convex shapes

options:
  1. 1 and 2
  2. 2 and 3
  3. 2 and 4
  4. 1, 2 and 4
Ans: 4


14. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?

1.     Single-link

2.     Complete-link

3.     Average-link

options:

  1.        1 and 2
  2.        1 and 3
  3.        2 and 3
  4.        1, 2 and 3

Ans: 4


15. What is true about K-Mean Clustering?

1.K-means is extremely sensitive to cluster center initializations

2. Bad initialization can lead to Poor convergence speed

3.Bad initialization can lead to bad overall clustering

options:

  1. 1 and 3
  2. 1 and 2
  3. 2 and 3
  4. 1, 2 and 3
Ans: 4


16. Which of the following can be applied to get good results for K-means algorithm corresponding to global minima?

1.Try to run algorithm for different centroid initialization

2.Adjust number of iterations

3.Find out the optimal number of clusters

options:

  1.  2 and 3
  2.  1 and 3
  3.  1 and 2
  4.  All of above

Ans: 4

17. Which of the following techniques would perform better for reducing dimensions of a data set?

  1. Removing columns which have too many missing values
  2. Removing columns which have high variance in data
  3. Removing columns with dissimilar data trends
  4. None of these

Ans: 1

18. Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model

  1.  TRUE
  2.  FALSE

Ans: 1

19. Which of the following algorithms cannot be used for reducing the dimensionality of data?

  1.  t-SNE
  2.  PCA
  3.  LDA
  4.  None of these

Ans: 4

20. PCA can be used for projecting and visualizing data in lower dimensions.

  1.  TRUE
  2.  FALSE

Ans: 1 

21. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?

1. PCA is an unsupervised method

2. It searches for the directions that data have the largest variance

3. Maximum number of principal components <= number of features

4.     All principal components are orthogonal to each other

Options: 

  1.       1 and 2
  2.       1 and 3
  3.       2 and 3
  4.        All of the above
Ans: 4

22. Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct?
  1. Higher ‘k’ means more regularization
  2. Higher ‘k’ means less regularization
  3. Can’t Say

Ans: 2

23. What will happen when eigenvalues are roughly equal?
  1. PCA will perform outstandingly
  2. PCA will perform badly
  3. Can’t Say
  4. None of above
Ans: 2


24. PCA works better if there is?

1. A linear structure in the data

2. If the data lies on a curved surface and not on a flat surface

3. If variables are scaled in the same unit

options: 

  1.       1 and 2
  2.       2 and 3
  3.       1 and 3
  4.       1 ,2 and 3

Ans: 3 

25. What happens when you get features in lower dimensions using PCA?

1.The features will still have interpretability

2.The features will lose interpretability

3.The features must carry all information present in data

4.     The features may not carry all information present in data


options:

  1.       1 and 3
  2.       1 and 4
  3.       2 and 3
  4.       2 and 4
Ans: 4


26. Which of the following option(s) is / are true?

1.You need to initialize parameters in PCA

2.You don’t need to initialize parameters in PCA

3.PCA can be trapped into local minima problem

4.PCA can’t be trapped into local minima problem

Options: 

  1.       1 and 3
  2.       1 and 4
  3.       2 and 3
  4.       2 and 4

Ans: 4


27. Which of the following options are correct, when you are applying PCA on a image dataset?

1.It can be used to effectively detect deformable objects.

2.It is invariant to affine transforms.

3.It can be used for lossy image compression.

4.It is not invariant to shadows.

Options: 

  1.       1 and 2
  2.       2 and 3
  3.       3 and 4
  4.       1 and 4

Ans: 3

28. Which of the following is untrue regarding Expectation Maximization algorithm?

  1.       An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
  2.     The alignment provides an estimate of the base or amino acid composition of each column in the site
  3.    The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
  4.     The row-by-column composition of the site already available is used to estimate the probability
Ans: 4

29. Out of the two repeated steps in EM algorithm, the step 2 is ________
  1.    the maximization step
  2.    the minimization step
  3.    the optimization step
  4.    the normalization step
Ans: 1

30. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.
  1.    True
  2.    False

Ans: 1








About Machine Learning

Welcome! Your Hub for AI, Machine Learning, and Emerging Technologies In today’s rapidly evolving tech landscape, staying updated with the ...