Machine Learning - Deep Learning: KNN

Showing posts with label KNN. Show all posts

Machine Learning Programs

👉Data Preprocessing in Machine Learning

👉Data Preprocessing in Machine learning (Handling Missing values )

👉Linear Regression - ML Program - Weight Prediction

👉Naïve Bayes Classifier - ML Program

👉LOGISTIC REGRESSION - PROGRAM

👉KNN Machine Learning Program

👉Support Vector Machine (SVM) - ML Program

👉Decision Tree Classifier on Iris Dataset

👉Classification of Iris flowers using Random Forest

👉DBSCAN

👉 Implement and demonstrate the FIND-S algorithm for finding the most specific hypothesis based on a given set of training data samples. Read the training data from a .CSV file

👉For a given set of training data examples stored in a .CSV file, implement and demonstrate the Candidate-Elimination algorithm to output a description of the set of all hypotheses consistent with the training examples.

👉Write a program to demonstrate the working of the decision tree based ID3 algorithm. Use an appropriate data set for building the decision tree and apply this knowledge to classify a new sample.

👉Build an Artificial Neural Network by implementing the Backpropagation algorithm and test the same using appropriate data sets.

👉Write a program to construct a Bayesian network considering medical data. Use this model to demonstrate the diagnosis of heart patients using a standard Heart Disease Data Set.

👉Write a program to implement k-Nearest Neighbors algorithm to classify the iris data set. Print both correct and wrong predictions.

👉Implement the non-parametric Locally Weighted Regression algorithm in order to fit data points. Select appropriate data set for your experiment and draw graphs.

👉Write a program to implement SVM algorithm to classify the iris data set. Print both correct and wrong predictions.

👉Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the same data set for clustering using k-Means algorithm. Compare the results of these two algorithms and comment on the quality of clustering.

👉 Write a program using scikit-learn to implement K-means Clustering

👉Program to calculate the entropy and the information gain

👉Program to implement perceptron.

Machine Learning MCQs-3 (Logistic Regression, KNN, SVM, Decision Tree)

Machine Learning MCQs-3

(Logistic Regression, KNN, SVM, Decision Tree)

---------------------------------------------------------------------

1.A Support Vector Machine can be used for

Performing linear or nonlinear classification
Performing regression
For outlier detection
All of the above

Ans: 4

2. The decision boundaries in a Support Vector machine is fully determined (or “supported”) by the instances located on the edge of the street?

True
False

Ans: 1

3. Support Vector Machines are not sensitive to feature scaling

True
False

Ans: 2

4. If we strictly impose that all instances be off the street and on the right side, this is called

Soft margin classification
Hard margin classification
Strict margin classification
Loose margin classification

Ans: 2

5.The main issues with hard margin classification are

It only works if the data is linearly separable
It is quite sensitive to outliers
It is impossible to find a margin if the data is not linearly separable
All of the above

Ans: 4

6. The objectives of Soft Margin Classification are to find a good balance between

Keeping the street as large as possible
Limiting the margin violations
Both of the above

Ans: 3

7. The balance between keeping the street as large as possible and limiting margin violations is controlled by this hyperparameter

tol
loss
penalty
C

Ans: 4

8. A smaller C value leads to a wider street but more margin violations.

True
False

Ans: 1

9. If your SVM model is overfitting, you can try regularizing it by reducing the value of

tol
C hyperparameter
intercept_scaling
None of the above

Ans: 2

10. A similarity function like Gaussian Radial Basis Function is used to

Measure how many features are related to each other
Find the most important features
Find the relationship between different features
Measure how much each instance resembles a particular landmark

Ans: 4

11. When using SVMs we can apply an almost miraculous mathematical technique for adding polynomial features and similarity features called the

Kernel trick
Shell trick
Mapping and Reducing
None of the Above

Ans: 1

12. Which is right for the gamma parameter of SVC which acts as a regularization hyperparameter

If model is overfitting, increase it, if it is underfitting, reduce it
If model is overfitting, reduce it, if it is underfitting, increase it
If model is overfitting, keep it same
If it is underfitting, keep it same

Ans: 2

13. LinearSVC is much faster than SVC(kernel="linear"))

True
False

Ans: 1

14. In SVM regression the model tries to

Fit the largest possible street between two classes while limiting margin violations
Fit as many instances as possible on the street while limiting margin violations

Ans: 2

15. Decision Trees can be used for

Classification Tasks
Regression Tasks
Multi-output tasks
All of the above

Ans: 4

16. The iris dataset has

5 features and 3 classes
4 features and 3 classes
2 features and 3 classes
4 features and 2 classes

Ans: 2

17. A node’s gini attribute measures

The number of training instances in the node
The ratio of training instances in the node
Its impurity

Ans: 3

18. If all the training instances of a node belong to the same class then the value of the node's Gini attribute will be

1
0
Any integer between 0 and 1
A negative value

Ans: 2

19. A Gini coefficient of 1 expresses maximal inequality among the training samples

True
False

Ans: 1

20. Gini index for a node is found by subtracting the sum of the square of ratio of each classes in a node from 1

True
False

Ans: 1

21. A decision tree estimates the probability that an instance belongs to a particular class k by finding the corresponding leaf node for the instance and then returning the ratio of training instances of class k

True
False

Ans: 1

22. The Decision Tree classifier predicts the class which has the highest probability

True
False

Ans: 1

23. The CART algorithm splits the training set in two subsets

Using all the features and a threshold tk
Using a single feature k and a threshold tk
Using half of the features and a threshold k

Ans: 2

24. How does the CART algorithm chooses the feature k and the threshold tk for splitting ?

It randomly chooses a feature k
It chooses the mean of the values of the feature k as threshold
It chooses the feature k and threshold tk which produces the purest subsets
It chooses the feature k and threshold tk such that the gini index value of the subsets is 0

Ans: 3

25. The cost function for finding the value of feature k and threshold tk takes into consideration

The Gini index values of the subsets
The number of instances in the subsets
The total number of instances in the node that is being split
All of these

Ans: 4

26. Once the CART algorithm has successfully split the training set in two

It stops splitting further
It splits the subsets using the same logic, then the sub- subsets and so on, recursively
It splits only the right subset
It splits only the left subset

Ans: 2

27.The CART algorithm stops recursion once it reaches the maximum depth (defined by the max_depth hyperparameter), or if it cannot find a split that will reduce impurity

True
False

Ans: 1

28. Which of the following are correct for the CART algorithm

It is a greedy algorithm
It greedily searches for an optimum split at each level
It does not check whether or not the split will lead to the lowest possible impurity several levels down
All of the above are correct

Ans: 4

29. While making a prediction in Decision Tree, each node only requires checking the value of one feature

True
False

Ans: 1

30. Gini impurity is slightly faster to compute in comparison to entropy

True
False

Ans: 1

31. Models like Decision Tree models are often called nonparametric model because

They do not have any parameters
The number of parameters is not determined prior to training
They have lesser parameters as compared to other models
They are easy to interpret and understand

Ans: 2

KNN Machine Learning Program

KNN Classifier for IRIS Data Set

Steps:

Import the library files
Read the dataset (Iris Dataset) and analyze the data
Preprocessing the data
Divide the data into Training and Testing
Build the model - KNN Classifier
Model Evaluation

Import the library files

2. Read the dataset (Iris Dataset) and analyze the data

3. Preprocessing the data

4. Divide the data into Training and Testing

5. Build the model - KNN Classifier

KNN Classifier

class sklearn.neighbors.KNeighborsClassifier(n_neighbors=5, *, weights='uniform', algorithm='auto', leaf_size=30, p=2, metric='minkowski', metric_params=None, n_jobs=None)

Parameters:

n_neighborsint, default=5

Number of neighbors to use by default for kneighbors queries.

weights{‘uniform’, ‘distance’} or callable, default=’uniform’

Weight function used in prediction. Possible values:

‘uniform’ : uniform weights. All points in each neighborhood are weighted equally.
‘distance’ : weight points by the inverse of their distance. in this case, closer neighbors of a query point will have a greater influence than neighbors which are further away.
[callable] : a user-defined function which accepts an array of distances, and returns an array of the same shape containing the weights.

algorithm{‘auto’, ‘ball_tree’, ‘kd_tree’, ‘brute’}, default=’auto’

Algorithm used to compute the nearest neighbors:

‘ball_tree’ will use BallTree
‘kd_tree’ will use KDTree
‘brute’ will use a brute-force search.
‘auto’ will attempt to decide the most appropriate algorithm based on the values passed to fit method.

Note: fitting on sparse input will override the setting of this parameter, using brute force.

leaf_sizeint, default=30

Leaf size passed to BallTree or KDTree. This can affect the speed of the construction and query, as well as the memory required to store the tree. The optimal value depends on the nature of the problem.

pint, default=2

Power parameter for the Minkowski metric. When p = 1, this is equivalent to using manhattan_distance (l1), and euclidean_distance (l2) for p = 2. For arbitrary p, minkowski_distance (l_p) is used.

metricstr or callable, default=’minkowski’

Metric to use for distance computation. Default is “minkowski”, which results in the standard Euclidean distance when p = 2. See the documentation of scipy.spatial.distance and the metrics listed in distance_metrics for valid metric values.

If metric is “precomputed”, X is assumed to be a distance matrix and must be square during fit. X may be a sparse graph, in which case only “nonzero” elements may be considered neighbors.

If metric is a callable function, it takes two arrays representing 1D vectors as inputs and must return one value indicating the distance between those vectors. This works for Scipy’s metrics, but is less efficient than passing the metric name as a string.

metric_paramsdict, default=None

Additional keyword arguments for the metric function.

n_jobsint, default=None

The number of parallel jobs to run for neighbors search. None means 1 unless in a joblib.parallel_backend context. -1 means using all processors. See Glossary for more details. Doesn’t affect fit method.

Machine Learning - Deep Learning

Machine Learning Programs

Machine Learning Programs

Machine Learning MCQs-3 (Logistic Regression, KNN, SVM, Decision Tree)

Machine Learning MCQs-3

KNN Machine Learning Program

KNN Classifier for IRIS Data Set

Steps:

Import the library files

2. Read the dataset (Iris Dataset) and analyze the data

3. Preprocessing the data

4. Divide the data into Training and Testing

5. Build the model - KNN Classifier

KNN Classifier

Parameters:

6. Model Evaluation

About Machine Learning

SOFTWARE ENGINEERING