Measuring Accuracy Using Cross-Validation

Measuring Accuracy Using Cross-Validation

• A good way to evaluate a model is to use cross-validation.

• Let’s use the cross_val_score() function to

ü evaluate our SGDClassifier model,

· using K-fold cross-validation with three folds.

• Remember that K-fold cross-validation means

ü splitting the training set into K folds (in this case, three), then

· making predictions and

· evaluating them on each fold using

ü a model trained on the remaining folds.

from sklearn.model_selection import cross_val_score

cross_val_score(sgd_clf, X_train, y_train_5, cv=3, scoring="accuracy")

array([0.96355, 0.93795, 0.95615])

ü Above 93% accuracy (ratio of correct predictions) on all cross-validation folds?

ü This looks amazing, doesn’t it?

ü let’s look at a very dumb classifier that just classifies every single image in the “not-5” class:

from sklearn.base import BaseEstimator

class Never5Classifier(BaseEstimator):

def fit(self, X, y=None):

return self

def predict(self, X):

return np.zeros((len(X), 1), dtype=bool)

ü Can you guess this model’s accuracy?

ü Let’s find out:

never_5_clf = Never5Classifier()

cross_val_score(never_5_clf, X_train, y_train_5, cv=3, scoring="accuracy")

array([0.91125, 0.90855, 0.90915])

· It has over 90% accuracy!

· This is simply because

ü only about 10% of the images are 5s,

ü so if you always guess that an image is not a 5,

• you will be right about 90% of the time.

• This demonstrates why accuracy is

• generally not the preferred performance measure for classifiers,

ü especially when you are dealing with skewed datasets

ü i.e., when some classes are much more frequent than others.

• Implementing Cross-Validation

ü Occasionally you will need more control over the cross-validation process than what Scikit-Learn provides off the shelf.

ü In these cases, you can implement cross-validation yourself.

Machine Learning - Deep Learning