ML: Intro to Machine Learning - MCQs

 


ML: Intro to Machine Learning


Q1. Applications of the Supervised Learning

Select the problem statements where you can apply supervised algorithms.

1.     For an e-commerce website, segmenting the unlabelled customers based on their behaviour from a large dataset.

2.     Given data on crop yields over the last 50 years, trying to predict next year's crop yields.

3.     Based on data samples of webpages, classifying a webpage whether the content on the web page should be considered "child friendly" or "adult".

4.     Given a large dataset of medical records from patients suffering from heart disease, try to learn whether there might be different groups of such patients.

 

Ans: Correct Answer:

  1. Given data on crop yields over the last 50 years, trying to predict next year’s crop yields.
  2. Based on data samples of webpages, classifying a webpage whether the content on the web page should be considered “child friendly” or “adult”.

Explanation:

·       Supervised learning is defined by its use of labeled datasets to train algorithms that to classify data or predict outcomes accurately.

·       The tasks of predicting crop yields and classifying a web page are based on some data which is used by ML model to learn. The task of predicting crop yields can be done by regression models, while classifying a webpage can be done by classification task.

·       Unsupervised learning, uses machine learning algorithms to analyze and cluster unlabeled datasets.

·       The task of segmenting customers based on their behaviour is a task of clustering, where ML model finds patterns and makes clusters based on the unlabelled data samples.

·       Same goes for grouping patients, based on the medical records of heart disease.

 

Q2. Applications of Unsupervised learning

What are the applications of Unsupervised learning ?

 

1.     Grouping of images of different types of animals, based on the image features only

2.     Given the dataset with previous 5 years of the prices of the houses. Predicting the house price.

3.     Grouping together users with similar viewing patterns in order to recommend similar content.

4.     Predicting the class of fruit given we know the type of fruits.

Ans:

Correct Answer:

1.         Grouping of images of different types of animals, based on the image features only,

  1. Grouping together users with similar viewing patterns in order to recommend similar content

Explanation:

·       Predicting the house prices and classifing the type of fruit are the use cases of supervised learning.

·       In the grouping of different types of animals and recommending similar content, both are unsupervised as there are no class labels given

 

Q3. Person of Interest

Popular online shopping platforms like Myntra, Flipkart, and Amazon are trying  to group their clients based on their recent purchases and interests so that they can design a suitable marketing strategy.

Which of the following Machine Learning method will be appropriate for doing so?

 

1.        Regression

2.        Classification

3.        Clustering Methods

4.        Recommendation

 

 

Ans: Correct Options:  3. Clustering Methods

Explanation:

  • Grouping of Customers is done on the basis of common characteristics so companies can market each group effectively and appropriately. This grouping of Customers can be effectively done by Clustering algorithms.
  • If it were asked to predict the price of a particular product then we would use a regression model
  • If it were asked to predict whether customers will buy the product or not then we would use the classification model.
  • If we were asked to recommend products to customers based on their recent purchases then we would use a recommendation model

 

Q4. Price is right

A firm is making a shopping mall and they have hired you as a Data Scientist for evaluating the prices of shops.

Which of the following Machine Learning task will you use?

1.        Classification

2.        Regression

3.        Clustering Methods

4.         Recommendation

Ans: Correct option: 2. Regression

Explanation:

  • The prices of shops are continuous and hence, it is a task of regression.
  • If it were asked to state whether to open a shopping mall or not then it would be a classification task.
  • If it were asked to state whether to group shops based on the type of items sold then it would be a clustering task.
  • It would be a recommendation task if it were asked to suggest a location for various types of shops.

 

Q5. Mr. Robot ?

You are an analyst at SBI Bank and recently discovered customer accounts to be hacked.

So you want to create software that categorizes each customer's account as hacked or safe.

Which machine learning task will be helpful in this particular task?

1.        Classification

2.        Regression

3.        Clustering

4.         Reinforcement learning

Ans: Correct Answer: 1.  Classification

Explanation:

  • Since the accounts will be categorized as hacked or un-hacked i.e. classified into categories/ classes. Hence, it is a classification problem.
  • If it were asked to predict the amount lost by each customer then it would have been a regression problem.
  • If it were asked to group customers by income so as to understand if there is any pattern behind hacking then it would have been a clustering problem.
  • Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones and is hence not applicable.

 

Q6. Poseidon

Government agencies that predict tsunamis, use Tsunami-detection buoys to confirm the existence of underwater earthquakes.

These buoys use the height of the waves and change in water pressure to predict a Tsunami.

Which of the following method might be used for doing the mentioned task?

 

1.        Regression

2.        Classification

3.        Clustering

4.         Reinforcement learning

Ans: Correct option: 2.  Classification

Explanation:

  • The task of predicting a Tsunami can yield two classes: Tsunami or No-Tsunami. And, since we have to predict categorical labels i.e. 2 classes. Hence, it is a Classification task.
  • If it were asked to predict the magnitude of an earthquake then it would have been a regression problem.
  • We cannot use the clustering method because it does not help in predicting tsunamis.
  • Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones and thus would not help in predicting tsunamis.

 

Q7. Cells at work

While conducting a Bone-marrow transplant, doctors check whether the donor’s cells are healthy or not. Several organizations use Machine learning models to determine healthy cells.

Which of the following machine learning methods can be used?

 

1.        Regression

2.        Classification

3.        Reinforcement learning

4.        Clustering

Ans: Correct Answer:2  Classification

Explanation:

  • Donor's cell can either be 'Healthy' or 'Not Healthy'. Which makes it a classification task.
  • We are not predicting any continuous numerical value and hence regression method is incorrect
  • Grouping cells would not help in predicting Healthy or Not healthy cells and hence clustering method is incorrect
  • Reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones and is not applicable in this scenario.

 

Q8. ML Paradigm

A computer program is said to learn from experience E with respect to some task T and some performance measure P, if its performance on T, as measured by P, improves with experience E.

Suppose we want to predict the future prices of cryptocurrency based on some data. According to the definition given above, What would be a reasonable choice for P?

1.        The Price prediction task.

2.        Based on the learning over past data, the process of learning to find the patterns within newer data.

3.        The accuracy of the algorithm for correctly predicting the future price of crypto-currencies.

4.         None of these.

Ans: The correct option is:

3.         The accuracy of the algorithm correctly predicting the future price of crypto-currencies.

Explanation:

·       As per the definition, ‘P’ is some performance measure that needs to be improved over time. So, the accuracy is one of the way we can measure the performance of predicting future prices.

·       As per the definition, ‘T’ is the task, so The price prediction task would actually be a ‘T’ in our case.

·       And, the process of learning to find the patterns within newer data with past data refers to experience ‘E’

 

Q9. Unlabelled

What type of machine learning algorithm is suitable for unlabeled data?

1.        Regression algorithms

2.        Clustering algorithms

3.        Classification

4.         All of the above

Ans: Correct answer: 2. Clustering algorithms

Explanation:

  • Both regression and classification methods require labeled data i.e. supervised learning.
  • However, clustering doesn't require labeled data.

 

Q10. Penny for suggestion?

What procedure or method powers Spotify's suggestion for a new song?

1.        Recommendation System

2.        Classification

3.        Regression

4.        None of the above

Ans: Correct answer: 1. Recommendation System

Explanation :

  • Classification and regression methods are used in supervised learning task and hence won't be applicable here.
  • The option None of the above is incorrect as Recommendation System is the correct answer.

 

Q11. Fly away

Imagine you are working in Vistara. You are given a task of predicting the airline fare based on previous trends.

What kind of machine learning task would this be categorized into?

1.        Classification

2.        Recommendation

3.        Time series forcasting

4.        Reinforcement learning

Ans: Correct Answer: 3. Time series forecasting

Explanation:

  • Whenever we want to predict values based on previous trends i.e. time axis is involved, the problem is categorized into "Time series forecasting"
  • Classification is not applicable here as we are predicting the airline fare based on previous trends.
  • Recommendation is not correct as we need to find the airline fare based on previous trends
  • Reinforcement learning is not applicable as reinforcement learning is a machine learning training method based on rewarding desired behaviors and/or punishing undesired ones

 

Ensemble Learning MCQs

 

Ensemble Learning MCQs

 

1.     The model which consists of a group of predictors is called a

Top of Form

  1.  Group
  2.  Entity
  3.  Ensemble
  4.  Set

Ans: C

2.     A Random forest is an ensemble of Decision Trees

Top of Form

  1.  True
  2.  False

Ans: A

3.     The steps involved in deciding the output of a Random Forest are

  1. Obtain the predictions of all individual trees
  2. Predict the class that gets the most votes
  3. Both of the above
  4. None

Ans: C

4.     A hard voting classifier takes into consideration

Top of Form

  1.  The probabilities of output from each classifier
  2.  The majority votes from the classifiers
  3.  The mean of the output from each classifier
  4.  The sum of the output from each classifier

Ans: B


5. If each classifier is a weak learner, the ensemble can still be a strong learner?

Top of Form

  1.  True
  2.  False

Ans: ABottom of FormBottom of Form

6. Ensemble methods work best when the predictors are

  1.  Sufficiently diverse
  2.  As independent from one another as possible
  3.  Making very different types of errors
  4.  All of the above

Ans: D

7. To get diverse classifiers we cannot train them using different algorithms

Top of Form

  1.  True
  2.  False

Ans: B

8. Training the classifiers in an ensemble using very different algorithms increases the chance that they will make very different types of errors, improving the ensemble’s accuracy

Top of Form

  1.  True
  2.  False

Ans: A

9. When we consider only the majority of the outputs from the classifiers then it is called

  1. Hard Voting
  2. Soft Voting
  3. Both
  4. None

Ans: A

10. Soft voting takes into consideration

  1. The majority of votes from the classifiers
  2. The highest class probability averaged over all the individual classifiers
  3. Both
  4. None of the above

Ans: B

11. In soft voting, the predicted class is the class with the highest class probability, averaged over all the individual classifiers

Top of Form

  1.  True
  2.  False

Ans: A

12. Soft voting achieves higher performance than hard voting because

  1.  Majority votes classifications are often wrong
  2.  It gives more weight to highly confident votes
  3.  Finding majority is computationally expensive
  4.  This statement is false

Ans: B

13. The parameter which decides the voting method in a VotingClassifier is

  1.  method
  2.  strategy
  3.  voting
  4.  Type

Ans: C

14. The parameter which holds the list of classifiers which are to be used in the voting classifier is

Top of Form

  1.  predictors
  2.  classifiers
  3.  estimators
  4.  Models

Ans: C

15. One way to get a diverse set of classifiers is to use the same training algorithm for every predictor, but to train them on different random subsets of the training set

Top of Form

  1.  True
  2.  False

Ans: A

16. When sampling is performed with replacement, the method is

A.     Bagging

B.     Pasting

C.     Both

D.     None

Ans: A

17. When sampling is performed without replacement, it is called

  1. Pasting
  2. Bagging
  3. Both
  4. None

Ans: A

18. Both bagging and pasting allow training instances to be sampled several times across multiple predictors, but only bagging allows training instances to be sampled several times for the same predictor

Top of Form

  1.  True
  2.  False

Ans: A

19. In bagging/pasting training set sampling and training, predictors can all be trained in parallel, via different CPU cores or even different servers

Top of Form

  1.  True
  2.  False

Ans: A

20. To use the bagging method, the value of the bootstrap parameter in the BaggingClassifier should be set to

  1.  True
  2.  False

Ans: A

21. To use the pasting method, the value of the bootstrap parameter in the BaggingClassifier should be set to

A.    True

B.    False

Ans: B

22. Overall, bagging often results in better models

Top of Form

  1.  True
  2.  False

Ans: A

23. How many training instances with replacement does the BaggingClassifier train if the size of the training set is m?

  1.  m/2
  2.  m/3
  3.  m
  4.  m-n where n is the number of features

Ans: C

24. With bagging, it is not possible that some instances are never sampled

  1.  True
  2.  False

Ans: B

25. Features can also be sampled in the BaggingClassifier

Top of Form

  1.  True
  2.  False

Ans: A

26. The hyperparameters which control the feature sampling are

  1.  max_samples and bootstrap
  2.  max_features and bootstrap_features
  3. Both
  4. None

Ans: B

27. Sampling both training instances and features is called the

  1. Random Patches method
  2. Random Subspaces method
  3. Both
  4. None

Ans: A

28. Keeping all training instances (i.e., bootstrap=False and max_samples=1.0) but sampling features (i.e., bootstrap_features=True and/or max_features smaller than 1.0) is called the

  1. Random Patches method
  2. Random Subspaces method
  3. Both
  4. None

Ans: B

29. Random forest is an ensemble of Decision Trees generally trained via ______

Top of Form

  1. Bagging
  2. Pasting
  3. Both
  4. None

Ans: A

30. We can make the trees of a Random Forest even more random by using random thresholds for each feature rather than searching for the best possible thresholds?

Top of Form

  1.  No
  2.  Yes, and these are called Extremely Randomised Trees ensemble

Ans: B

31. If we look at a single Decision Tree, important features are likely to appear closer to

Top of Form

  1. Leaf of the tree
  2. Middle of the tree
  3. Root of the tree
  4. None of these

Ans: C

32. Feature importances are available via the feature_importances_ method of the RandomForestClassifier object.

  1.  True
  2.  False

Ans: A

33. The general idea of most boosting methods is to train predictors sequentially, each trying to correct its predecessor.

Top of Form

  1.  True
  2.  False

Ans: A

34. One of the drawbacks of AdaBoost classifier is that

  1.  It is slow
  2.  It cannot be parallelized
  3.  It cannot be performed on larger training sets
  4.  It requires a lot of memory and processing power

Ans: B

35. A Decision Stump is a Decision Tree with

  1. More than two leaf nodes
  2. Max depth of 1, i.e. single decision node with two leaf nodes
  3. Having more than 2 decision nodes
  4. None

Ans: B

36. In Gradient Boosting, instead of tweaking the instance weights at every iteration like AdaBoost does, it tries to fit the new predictor to the residual errors made by the previous predictor.

  1.  True
  2.  False

Ans: A

37. The learning_rate hyperparameter of GradientBoostingRegressor scales the contribution of each tree?

  1.  True
  2.  False

Ans: A

38. The ensemble method in which we train a model to perform the aggregation of outputs from all the predictors is called

  1.  Boosting
  2.  Bagging
  3.  Stacking
  4.  Pasting

Ans: C

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

Bottom of Form

 

About Machine Learning

Welcome! Your Hub for AI, Machine Learning, and Emerging Technologies In today’s rapidly evolving tech landscape, staying updated with the ...