Machine Learning - Deep Learning: Dimensionality Reduction

Showing posts with label Dimensionality Reduction. Show all posts

About Machine Learning 1

Machine Learning

The Machine Learning Landscape
Classification
Support Vector Machines
Decision Trees
Ensemble Learning and Random Forests
Dimensionality Reduction
Clustering

👉YouTube Link: https://www.youtube.com/@drrambabupemula

👉 Machine Learning 1 Syllabus

Unit I:

The Machine Learning Landscape: What Is Machine Learning? Why Use Machine Learning? Types of Machine Learning Systems, Supervised/Unsupervised Learning, Batch and Online Learning, Instance-Based Versus Model-Based Learning, Main Challenges of Machine Learning, Insufficient Quantity of Training Data, Nonrepresentative Training Data, Poor-Quality Data, Irrelevant Features, Overfitting the Training Data, Underfitting the Training Data, Stepping Back, Testing and Validating.

👉UNIT 1(A) NOTEs : The Machine Learning Landscape Notes

👉UNIT 1(A) PPTs: The Machine Learning Landscape

👉UNIT 1(B) NOTEs: The Machine Learning Landscape NOTEs

👉Machine Learning 1 : UNIT 1(B) PPTs: The Machine Learning Landscape PPTs

👉Machine Learning 1: UNIT 1 The Machine Learning Landscape Questions

Unit II:

Classification: Training a Binary Classifier, Performance Measures, Measuring Accuracy UsingCross-Validation, Confusion Matrix, Precision and Recall, Precision/RecallTradeoff , The ROC Curve, Multiclass Classification, Error Analysis, Multilabel Classification, Multi Output Classification. k-NN Classifier.

👉Machine Learning 1: UNIT 2 NOTEs: Classification Notes

👉Machine Learning 1 : UNIT 2: Classification PPTs

👉Machine Learning 1: UNIT 2: Classification MCQs

👉Machine Learning 1: UNIT 2: Classification Questions

Unit III:

Support Vector Machines: Linear SVM Classification, Soft Margin Classification, Nonlinear SVM Classification, Polynomial Kernel, Adding Similarity Features, Gaussian RBF Kernel, Computational Complexity, SVM Regression, Under the Hood, Decision Function and Predictions, Training Objective, Quadratic Programming, The Dual Problem, Kernelized SVM, Online SVMs.

👉Machine Learning 1: UNIT 3 (A) NOTES: Support Vector Machines NOTEs

👉Machine Learning 1: UNIT 3 (A) PPTs: Support Vector Machines PPTs

👉Machine Learning 1: UNIT 3 (B) NOTEs: Support Vector Machines NOTEs

👉Machine Learning 1: UNIT 3 (B) PPTs: Support Vector Machines PPTs

👉Machine Learning 1: UNIT 3 A & B : Support Vector Machines Questions

👉Machine Learning 1: UNIT 3 : Support Vector Machines MCQs

Unit IV:

Decision Trees: Training and Visualizing a Decision Tree, Making Predictions, Estimating Class Probabilities, The CART Training Algorithm, Computational Complexity, Gini Impurity or Entropy? Regularization Hyperparameters, Regression

👉Machine Learning 1: UNIT 4 (A) NOTEs: Decision Trees NOTEs

👉Machine Learning 1: UNIT 4 (A) PPTs: Decision Trees PPTs

👉Machine Learning 1: UNIT 4 (A): Decision Trees Questions

👉Machine Learning 1: UNIT 4 (A) : Decision Trees MCQs

Ensemble Learning and Random Forests: Voting Classifiers, Bagging and Pasting, Bagging and Pasting in Scikit-Learn, Out-of-Bag Evaluation, Random Patches and Random Subspaces, Random Forests, Extra-Trees, Feature Importance, Boosting, AdaBoost, Gradient Boosting, Stacking.

👉Machine Learning 1: UNIT 4 (B) NOTES: Ensemble Learning and Random Forests NOTES

👉Machine Learning 1: UNIT 4 (B) PPTs: Ensemble Learning and Random Forests PPTs

👉Machine Learning 1: UNIT 4 (B) : Ensemble Learning and Random Forests Questions

👉Machine Learning 1: UNIT 4 (B) : Ensemble Learning and Random Forests MCQs

Unit V:

Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for Dimensionality Reduction, Projection, PCA.

👉Machine Learning 1: UNIT-5(A) NOTES: Dimensionality Reduction NOTES

👉Machine Learning 1: UNIT-5(A) PPTs: Dimensionality Reduction PPTs

👉Machine Learning 1: UNIT-5(A): Dimensionality Reduction Questions

Clustering: How does clustering work: finding similarities using distances, Euclidean distance and other distance metrics. k-Means Clustering: Plotting customers with their segments, normalizing features, cluster centres and interpreting the Clusters. Hierarchical Clustering.

👉Machine Learning 1: UNIT 5(B) NOTEs: Clustering NOTES

👉Machine Learning 1: UNIT 5 (B) PPTs: Clustering PPTs

👉Machine Learning 1: UNIT 5 (B) : Clustering Questions

Textbooks:

1. Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.

2. Pradhan, Manaranjan, and U. Dinesh Kumar. Machine Learning using Python. Wiley, IIM Bangalore, 2019.

References:

1. Introduction to Machine Learning, Ethem Alpaydin 2nd Edition, MIT Press 2000

2. Machine Learning, Tom M. Mitchell, McGraw Hill, 1997, ISBN: 0-07-042807-7.

Machine Learning 1 Syllabus

Machine Learning

Syllabus

Unit I:

Unit II:

Classification: Training a Binary Classifier, Performance Measures, Measuring Accuracy Using Cross-Validation, Confusion Matrix, Precision and Recall, Precision/Recall Tradeoff, The ROC Curve, Multiclass Classification, Error Analysis, Multilabel Classification, Multi Output Classification. k-NN Classifier.

Unit III:

Unit IV:

Unit V:

Dimensionality Reduction: The Curse of Dimensionality, Main Approaches for Dimensionality Reduction, Projection, PCA.

Textbooks:

1. Géron, Aurélien. Hands-on machine learning with Scikit-Learn, Keras, and TensorFlow: Concepts, tools, and techniques to build intelligent systems. O'Reilly Media, 2019.

2. Pradhan, Manaranjan, and U. Dinesh Kumar. Machine Learning using Python. Wiley, IIM Bangalore, 2019.

References:

1. Introduction to Machine Learning, Ethem Alpaydin 2nd Edition, MIT Press 2000

2. Machine Learning, Tom M. Mitchell, McGraw Hill, 1997, ISBN: 0-07-042807-7.

About Machine Learning

Machine Learning

👉 About Machine Learning 1

The Machine Learning Landscape
Classification
Support Vector Machines
Decision Trees
Ensemble Learning and Random Forests
Dimensionality Reduction
Clustering
👉 YouTube Link: https://www.youtube.com/@drrambabupemula

👉 About Machine Learning 2

👉 About Machine Learning 3

Introduction
Data Pre-processing
Performance measurement of models
Supervised Learning
Decision Tree Learning
Unsupervised Learning
Ensemble Models

👉 Machine Learning MCQs

👉 Machine Learning Programs

Machine Learning MCQs - 4 (Clustering, Dimensionality Reduction)

Machine Learning MCQs - 4
(Clustering, Dimensionality Reduction)

---------------------------------------------------------------------

1. Which of the following is finally produced by Hierarchical Clustering?

final estimate of cluster centroids
tree showing how close things are to each other
assignment of each point to clusters
all of the mentioned

Ans: 2

2. Which of the following is required by K-means clustering?

defined distance metric
number of clusters
initial guess as to cluster centroids
all of the mentioned

Ans: 4

3. Point out the wrong statement.

k-means clustering is a method of vector quantization
k-means clustering aims to partition n observations into k clusters
k-nearest neighbor is same as k-means
none of the mentioned

Ans: 3

4. Which of the following combination is incorrect?

Continuous – euclidean distance
Continuous – correlation similarity
Binary – manhattan distance
None of the mentioned

Ans: 4

5. Hierarchical clustering should be primarily used for exploration

True
False

Ans: 1

6. Which of the following function is used for k-means clustering?

k-means
k-mean
heatmap
none of the mentioned

Ans: 1

7. Which of the following clustering requires merging approach?

Partitional
Hierarchical
Naive Bayes
None of the mentioned

Ans: 2

8. K-means is not deterministic and it also consists of number of iterations.

True
False

Ans: 1

9. Which of the following can act as possible termination conditions in K-Means?

For a fixed number of iterations.
Assignment of observations to clusters does not change between iterations. Except for cases with a bad local minimum.
Centroids do not change between successive iterations.
Terminate when RSS falls below a threshold

options:

1, 3 and 4
1, 2 and 3
1, 2 and 4
All of the above

Ans: 4

10. Which of the following clustering algorithms suffers from the problem of convergence at local optima?

1. K- Means clustering algorithm

2. Agglomerative clustering algorithm

3. Expectation-Maximization clustering algorithm

4. Diverse clustering algorithm

options:

1 only
2 and 3
2 and 4
1 and 3

Ans: 4

11. What could be the possible reason(s) for producing two different dendrograms using agglomerative clustering algorithm for the same dataset?

Proximity function used
of data points used
of variables used
All of the above

Ans: 4

12. In the figure below, if you draw a horizontal line on y-axis for y=2. What will be the number of clusters formed?

Ans: 2

13. In which of the following cases will K-Means clustering fail to give good results?

1. Data points with outliers

2. Data points with different densities

3. Data points with round shapes

4. Data points with non-convex shapes

options:

1 and 2
2 and 3
2 and 4
1, 2 and 4

Ans: 4

14. Which of the following metrics, do we have for finding dissimilarity between two clusters in hierarchical clustering?

1. Single-link

2. Complete-link

3. Average-link

options:

1 and 2
1 and 3
2 and 3
1, 2 and 3

Ans: 4

15. What is true about K-Mean Clustering?

1.K-means is extremely sensitive to cluster center initializations

2. Bad initialization can lead to Poor convergence speed

3.Bad initialization can lead to bad overall clustering

options:

1 and 3
1 and 2
2 and 3
1, 2 and 3

Ans: 4

16. Which of the following can be applied to get good results for K-means algorithm corresponding to global minima?

1.Try to run algorithm for different centroid initialization

2.Adjust number of iterations

3.Find out the optimal number of clusters

options:

2 and 3
1 and 3
1 and 2
All of above

Ans: 4

17. Which of the following techniques would perform better for reducing dimensions of a data set?

Removing columns which have too many missing values
Removing columns which have high variance in data
Removing columns with dissimilar data trends
None of these

Ans: 1

18. Dimensionality reduction algorithms are one of the possible ways to reduce the computation time required to build a model

TRUE
FALSE

Ans: 1

19. Which of the following algorithms cannot be used for reducing the dimensionality of data?

t-SNE
PCA
LDA
None of these

Ans: 4

20. PCA can be used for projecting and visualizing data in lower dimensions.

TRUE
FALSE

Ans: 1

21. The most popularly used dimensionality reduction algorithm is Principal Component Analysis (PCA). Which of the following is/are true about PCA?

1. PCA is an unsupervised method

2. It searches for the directions that data have the largest variance

3. Maximum number of principal components <= number of features

4. All principal components are orthogonal to each other

Options:

1 and 2
1 and 3
2 and 3
All of the above

Ans: 4

22. Suppose we are using dimensionality reduction as pre-processing technique, i.e, instead of using all the features, we reduce the data to k dimensions with PCA. And then use these PCA projections as our features. Which of the following statement is correct?

Higher ‘k’ means more regularization
Higher ‘k’ means less regularization
Can’t Say

Ans: 2

23. What will happen when eigenvalues are roughly equal?

PCA will perform outstandingly
PCA will perform badly
Can’t Say
None of above

Ans: 2

24. PCA works better if there is?

1. A linear structure in the data

2. If the data lies on a curved surface and not on a flat surface

3. If variables are scaled in the same unit

options:

1 and 2
2 and 3
1 and 3
1 ,2 and 3

Ans: 3

25. What happens when you get features in lower dimensions using PCA?

1.The features will still have interpretability

2.The features will lose interpretability

3.The features must carry all information present in data

4. The features may not carry all information present in data

options:

1 and 3
1 and 4
2 and 3
2 and 4

Ans: 4

26. Which of the following option(s) is / are true?

1.You need to initialize parameters in PCA

2.You don’t need to initialize parameters in PCA

3.PCA can be trapped into local minima problem

4.PCA can’t be trapped into local minima problem

Options:

1 and 3
1 and 4
2 and 3
2 and 4

Ans: 4

27. Which of the following options are correct, when you are applying PCA on a image dataset?

1.It can be used to effectively detect deformable objects.

2.It is invariant to affine transforms.

3.It can be used for lossy image compression.

4.It is not invariant to shadows.

Options:

1 and 2
2 and 3
3 and 4
1 and 4

Ans: 3

28. Which of the following is untrue regarding Expectation Maximization algorithm?

An initial guess is made as to the location and size of the site of interest in each of the sequences, and these parts of the sequence are aligned
The alignment provides an estimate of the base or amino acid composition of each column in the site
The column-by-column composition of the site already available is used to estimate the probability of finding the site at any position in each of the sequences
The row-by-column composition of the site already available is used to estimate the probability

Ans: 4

29. Out of the two repeated steps in EM algorithm, the step 2 is ________

the maximization step
the minimization step
the optimization step
the normalization step

Ans: 1

30. In the intermediate steps of EM algorithm, the number of each base in each column is determined and then converted to fractions.

True
False

Ans: 1