Machine Learning MCQs - 4 (Clustering, Dimensionality Reduction)

 Machine Learning MCQs - 4 
(Clustering, Dimensionality Reduction)

---------------------------------------------------------------------

1. Which of the following is finally produced by Hierarchical Clustering?

  1. final estimate of cluster centroids
  2. tree showing how close things are to each other
  3. assignment of each point to clusters
  4. all of the mentioned

Ans: 2

2. Which of the following is required by K-means clustering?

  1. defined distance metric
  2. number of clusters
  3. initial guess as to cluster centroids
  4. all of the mentioned

Ans: 4

3. Point out the wrong statement.

  1. k-means clustering is a method of vector quantization
  2. k-means clustering aims to partition n observations into k clusters
  3. k-nearest neighbor is same as k-means
  4. none of the mentioned

Ans: 3

4. Which of the following combination is incorrect?

  1. Continuous – euclidean distance
  2. Continuous – correlation similarity
  3. Binary – manhattan distance
  4. None of the mentioned

Ans: 4

5. Hierarchical clustering should be primarily used for exploration

  1. True
  2. False

Ans: 1

6. Which of the following function is used for k-means clustering?

  1. k-means
  2. k-mean
  3. heatmap
  4. none of the mentioned

Ans: 1

7. Which of the following clustering requires merging approach?

  1. Partitional
  2. Hierarchical
  3. Naive Bayes
  4. None of the mentioned

Ans: 2

8. K-means is not deterministic and it also consists of number of iterations.

  1. True
  2. False

Ans: 1

9. Which of the following can act as possible termination conditions in K-Means?


  1. For a fixed number of iterations.
  2. Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
  3. Centroids do not change between successive iterations.
  4. Terminate when RSS falls below a threshold
options: 
  1.  1, 3 and 4
  2.  1, 2 and 3
  3.  1, 2 and 4
  4.  All of the above
Ans: 4


10. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

1.     K- Means clustering algorithm

2.     Agglomerative clustering algorithm

3.     Expectation-Maximization clustering algorithm

4.     Diverse clustering algorithm

options: 

  1.        1 only
  2.        2 and 3
  3.        2 and 4
  4.        1 and 3
Ans: 4

11. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset?
  1.       Proximity function used
  2.       of data points used
  3.        of variables used
  4.        All of the above
Ans: 4


12. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?





  1. 1
  2. 2
  3. 3
  4. 4
Ans: 2

13. In which of the following cases will K-Means clustering fail to give good results?

1.     Data points with outliers

2.     Data points with different densities

3.     Data points with round shapes

4.     Data points with non-convex shapes

options:
  1. 1 and 2
  2. 2 and 3
  3. 2 and 4
  4. 1, 2 and 4
Ans: 4


14. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?

1.     Single-link

2.     Complete-link

3.     Average-link

options:

  1.        1 and 2
  2.        1 and 3
  3.        2 and 3
  4.        1, 2 and 3

Ans: 4


15. What is true about K-Mean Clustering?

1.K-means is extremely sensitive to cluster center initializations

2. Bad initialization can lead to Poor convergence speed

3.Bad initialization can lead to bad overall clustering

options:

  1. 1 and 3
  2. 1 and 2
  3. 2 and 3
  4. 1, 2 and 3
Ans: 4


16. Which of the following can be applied to get good results for K-means algorithm corresponding to global minima?

1.Try to run algorithm for different centroid initialization

2.Adjust number of iterations

3.Find out the optimal number of clusters

options:

  1.  2 and 3
  2.  1 and 3
  3.  1 and 2
  4.  All of above

Ans: 4

17. Which of the following techniques would perform better for reducing dimensions of a data set?

  1. Removing columns which have too many missing values
  2. Removing columns which have high variance in data
  3. Removing columns with dissimilar data trends
  4. None of these

Ans: 1

18. Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model

  1.  TRUE
  2.  FALSE

Ans: 1

19. Which of the following algorithms cannot be used for reducing the dimensionality of data?

  1.  t-SNE
  2.  PCA
  3.  LDA
  4.  None of these

Ans: 4

20. PCA can be used for projecting and visualizing data in lower dimensions.

  1.  TRUE
  2.  FALSE

Ans: 1 

21. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?

1. PCA is an unsupervised method

2. It searches for the directions that data have the largest variance

3. Maximum number of principal components <= number of features

4.     All principal components are orthogonal to each other

Options: 

  1.       1 and 2
  2.       1 and 3
  3.       2 and 3
  4.        All of the above
Ans: 4

22. Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct?
  1. Higher ‘k’ means more regularization
  2. Higher ‘k’ means less regularization
  3. Can’t Say

Ans: 2

23. What will happen when eigenvalues are roughly equal?
  1. PCA will perform outstandingly
  2. PCA will perform badly
  3. Can’t Say
  4. None of above
Ans: 2


24. PCA works better if there is?

1. A linear structure in the data

2. If the data lies on a curved surface and not on a flat surface

3. If variables are scaled in the same unit

options: 

  1.       1 and 2
  2.       2 and 3
  3.       1 and 3
  4.       1 ,2 and 3

Ans: 3 

25. What happens when you get features in lower dimensions using PCA?

1.The features will still have interpretability

2.The features will lose interpretability

3.The features must carry all information present in data

4.     The features may not carry all information present in data


options:

  1.       1 and 3
  2.       1 and 4
  3.       2 and 3
  4.       2 and 4
Ans: 4


26. Which of the following option(s) is / are true?

1.You need to initialize parameters in PCA

2.You don’t need to initialize parameters in PCA

3.PCA can be trapped into local minima problem

4.PCA can’t be trapped into local minima problem

Options: 

  1.       1 and 3
  2.       1 and 4
  3.       2 and 3
  4.       2 and 4

Ans: 4


27. Which of the following options are correct, when you are applying PCA on a image dataset?

1.It can be used to effectively detect deformable objects.

2.It is invariant to affine transforms.

3.It can be used for lossy image compression.

4.It is not invariant to shadows.

Options: 

  1.       1 and 2
  2.       2 and 3
  3.       3 and 4
  4.       1 and 4

Ans: 3

28. Which of the following is untrue regarding Expectation Maximization algorithm?

  1.       An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
  2.     The alignment provides an estimate of the base or amino acid composition of each column in the site
  3.    The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
  4.     The row-by-column composition of the site already available is used to estimate the probability
Ans: 4

29. Out of the two repeated steps in EM algorithm, the step 2 is ________
  1.    the maximization step
  2.    the minimization step
  3.    the optimization step
  4.    the normalization step
Ans: 1

30. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.
  1.    True
  2.    False

Ans: 1








Machine Learning MCQs-3 (Logistic Regression, KNN, SVM, Decision Tree)

 Machine Learning MCQs-3

 (Logistic Regression, KNN, SVM, Decision Tree)

---------------------------------------------------------------------

1.A Support Vector Machine can be used for

  1. Performing linear or nonlinear classification
  2. Performing regression
  3. For outlier detection
  4. All of the above
Ans: 4

2.  The decision boundaries in a Support Vector machine is fully determined (or “supported”) by the instances located on the edge of the street? 

  1. True
  2. False
Ans: 1 

3.  Support Vector Machines are not sensitive to feature scaling

  1. True
  2. False

Ans: 2 

4. If we strictly impose that all instances be off the street and on the right side, this is called

  1. Soft margin classification
  2. Hard margin classification
  3. Strict margin classification
  4. Loose margin classification

Ans: 2

5.The main issues with hard margin classification are

  1. It only works if the data is linearly separable
  2. It is quite sensitive to outliers
  3. It is impossible to find a margin if the data is not linearly separable
  4. All of the above

Ans: 4


6. The objectives of Soft Margin Classification are to find a good balance between

  1. Keeping the street as large as possible
  2. Limiting the margin violations
  3. Both of the above

Ans: 3

7. The balance between keeping the street as large as possible and limiting margin violations is controlled by this hyperparameter

  1. tol
  2. loss
  3. penalty
  4. C

Ans: 4

8. A smaller C value leads to a wider street but more margin violations.

  1. True
  2. False

Ans: 1

9. If your SVM model is overfitting, you can try regularizing it by reducing the value of

  1. tol
  2. C hyperparameter
  3. intercept_scaling
  4. None of the above

Ans: 2

10. A similarity function like Gaussian Radial Basis Function is used to

  1. Measure how many features are related to each other
  2. Find the most important features
  3. Find the relationship between different features
  4. Measure how much each instance resembles a particular landmark

Ans: 4

11. When using SVMs we can apply an almost miraculous mathematical technique for adding polynomial features and similarity features called the

  1. Kernel trick
  2. Shell trick
  3. Mapping and Reducing
  4. None of the Above

Ans: 1

12. Which is right for the gamma parameter of SVC which acts as a regularization hyperparameter

  1. If model is overfitting, increase it, if it is underfitting, reduce it
  2. If model is overfitting, reduce it, if it is underfitting, increase it
  3. If model is overfitting, keep it same
  4. If it is underfitting, keep it same

Ans: 2

13. LinearSVC is much faster than SVC(kernel="linear"))

  1. True
  2. False

Ans: 1

14. In SVM regression the model tries to

  1. Fit the largest possible street between two classes while limiting margin violations
  2. Fit as many instances as possible on the street while limiting margin violations

Ans: 2

15. Decision Trees can be used for

  1. Classification Tasks
  2. Regression Tasks
  3. Multi-output tasks
  4. All of the above

Ans: 4

16. The iris dataset has

  1. 5 features and 3 classes
  2. 4 features and 3 classes
  3. 2 features and 3 classes
  4. 4 features and 2 classes

Ans: 2

17.    A node’s gini attribute measures

  1. The number of training instances in the node
  2. The ratio of training instances in the node
  3. Its impurity

Ans: 3

18. If all the training instances of a node belong to the same class then the value of the node's Gini attribute will be

  1. 1
  2. 0
  3. Any integer between 0 and 1
  4. A negative value

Ans: 2

19. A Gini coefficient of 1 expresses maximal inequality among the training samples

  1. True
  2. False

Ans: 1

20. Gini index for a node is found by subtracting the sum of the square of ratio of each classes in a node from 1

  1. True
  2. False

Ans: 1

21. A decision tree estimates the probability that an instance belongs to a particular class k by finding the corresponding leaf node for the instance and then returning the ratio of training instances of class k

  1. True
  2. False

Ans: 1

22. The Decision Tree classifier predicts the class which has the highest probability

  1. True
  2. False

Ans: 1

23. The CART algorithm splits the training set in two subsets

  1. Using all the features and a threshold tk
  2. Using a single feature k and a threshold tk
  3. Using half of the features and a threshold k

Ans: 2

24. How does the CART algorithm chooses the feature k and the threshold tk for splitting ?

  1. It randomly chooses a feature k
  2. It chooses the mean of the values of the feature k as threshold
  3. It chooses the feature k and threshold tk which produces the purest subsets
  4. It chooses the feature k and threshold tk such that the gini index value of the subsets is 0

Ans: 3

25. The cost function for finding the value of feature k and threshold tk takes into consideration

  1. The Gini index values of the subsets
  2. The number of instances in the subsets
  3. The total number of instances in the node that is being split
  4. All of these

Ans: 4

26. Once the CART algorithm has successfully split the training set in two

  1. It stops splitting further
  2. It splits the subsets using the same logic, then the sub- subsets and so on, recursively
  3. It splits only the right subset
  4. It splits only the left subset

Ans: 2

27.The CART algorithm stops recursion once it reaches the maximum depth (defined by the max_depth hyperparameter), or if it cannot find a split that will reduce impurity

  1. True
  2. False

Ans: 1

28. Which of the following are correct for the CART algorithm

  1. It is a greedy algorithm
  2. It greedily searches for an optimum split at each level
  3. It does not check whether or not the split will lead to the lowest possible impurity several levels down
  4. All of the above are correct

Ans: 4

29. While making a prediction in Decision Tree, each node only requires checking the value of one feature

  1. True
  2. False

Ans: 1

30. Gini impurity is slightly faster to compute in comparison to entropy

  1. True
  2. False

Ans: 1

31. Models like Decision Tree models are often called nonparametric model because

  1. They do not have any parameters
  2. The number of parameters is not determined prior to training
  3. They have lesser parameters as compared to other models
  4. They are easy to interpret and understand

Ans: 2


About Machine Learning

Welcome! Your Hub for AI, Machine Learning, and Emerging Technologies In today’s rapidly evolving tech landscape, staying updated with the ...