Precision/Recall Trade-off
Precision/Recall Trade-off
·
To understand this trade-off,
ü let’s look at how the SGDClassifier
makes its classification
decisions.
ü For each instance, it computes
a score based on a decision function.
ü If that score is greater than a threshold,
•
it assigns the instance to the positive class;
•
otherwise it assigns it to the negative class.
Figure 3-3. In this precision/recall trade-off, images are ranked by
their classifier score, and those above the chosen decision threshold are
considered positive; the higher the threshold, the lower the recall, but (in
general) the higher the precision.
•
Figure 3 shows a few digits positioned from the lowest score on the left to the highest score on the right.
•
Suppose the decision threshold is positioned at the central arrow (between the two 5s):
ü
you will find
ü 4 true positives (actual 5s) on the right
of that threshold, and
ü 1 false positive (actually a 6).
ü
Therefore, with that threshold, the precision is 80% (4 out of 5).
•
But out of 6 actual 5s, the classifier
only detects
4,
ü so the
recall is 67% (4 out of 6).
•
If you raise the threshold (move it to the arrow on
the right),
ü the false positive (the 6) becomes a true negative,
ü thereby increasing the precision (up
to 100% in this case),
ü but
one true positive becomes a false negative, decreasing recall down
to 50%.
•
Conversely, lowering the threshold
increases recall and reduces precision.
·
Scikit-Learn does not let you set the threshold
directly,
ü but it
does give you access to the decision scores that it uses to make predictions.
·
Instead of calling the classifier’s predict() method,
ü you
can call its decision_function() method,
·
which returns a score for each
instance, and
·
then use any threshold you
want to make predictions based on those scores:
y_scores = sgd_clf.decision_function([some_digit])
y_scores
array([2412.53175101])
threshold = 0
y_some_digit_pred = (y_scores > threshold)
array([ True])
·
The SGDClassifier uses a threshold equal to 0, so the previous code returns the same result as the predict()
method (i.e., True).
·
Let’s raise the threshold:
threshold = 8000
y_some_digit_pred = (y_scores > threshold)
y_some_digit_pred
array([False])
•
This confirms that raising the threshold decreases recall.
•
The image actually represents a 5, and
ü the classifier detects it when the threshold is 0,
•
but it misses it when the threshold is increased to 8,000.
How do you decide which threshold to use?
·
First, use the cross_val_predict() function to get the scores of all instances in the training set,
ü but this time specify that
you want to return decision scores instead of predictions:
y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method = "decision_function")
With these scores,
•
use the precision_recall_curve() function to compute
ü precision and
ü recall for all possible thresholds:
from sklearn.metrics import precision_recall_curve
precisions, recalls, thresholds = precision_recall_curve
(y_train_5, y_scores)
•
Finally, use Matplotlib to plot precision and recall as functions of the threshold value (Figure 4):
def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
plt.plot(thresholds, recalls[:-1], "g-", label="Recall")
[...] # highlight the
threshold and add the legend, axis label, and grid
plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
plt.show()
Figure 4.
Precision and recall versus the decision threshold
·
You may wonder why the precision curve is bumpier than the recall curve in Figure 4.
·
The reason is that precision may sometimes go down when you raise the threshold.
·
To understand why,
ü look back at Figure 3 and
ü notice what happens when
you start from the central threshold and
ü move it just one digit to
the right:
·
precision goes from 4/5 (80%) down to 3/4 (75%).
·
On the other hand,
ü recall can only go down when the threshold
is increased,
ü which explains why its curve looks smooth.
•
Another way to select a good
precision/recall trade-off is to
•
plot precision directly against recall, as
shown in Figure 5 (the same threshold as earlier is highlighted).
ü
You can see that precision really starts to fall sharply
around 80% recall.
ü
You will probably want to select a precision/recall trade-off just
before that drop—for example, at around 60% recall.
ü
But of course, the choice depends on
your project.
Figure 5. Precision versus recall
·
Suppose you decide to aim for 90% precision.
·
You look up the first plot and
ü find that you need to use a threshold of about 8,000.
·
To be more precise you can search for the lowest threshold that gives you at least 90%
precision
·
np.argmax() will give you the first index of the maximum value,
ü which in this case means
the first True value:
threshold_90_precision = thresholds [np.argmax ( precisions >= 0.90 ) ] # ~7816
•
To make predictions (on the training set for now),
ü instead of calling the classifier’s predict()
method,
ü
you can run this code:
y_train_pred_90 = (y_scores >= threshold_90_precision)
•
Let’s check these predictions’ precision and recall:
precision_score(y_train_5, y_train_pred_90)
0.900038008361839
recall_score(y_train_5, y_train_pred_90)
0.4368197749492714
·
Great, you have a 90% precision classifier!
·
As you can see, it is fairly easy to create a classifier with virtually any
precision you want:
ü just set a high enough threshold.
·
But wait, not so fast.
·
A high-precision classifier is not very useful if its recall is too low!
Precision and Recall
Precision and Recall
Scikit-Learn provides several functions to compute classifier metrics, including precision and recall:
from sklearn.metrics import precision_score, recall_score
precision_score(y_train_5, y_train_pred) # == 4096 /
(4096 + 1522)
0.7290850836596654
recall_score(y_train_5, y_train_pred) # == 4096 /
(4096 + 1325)
0.7555801512636044
·
Now your 5-detector does
not look as shiny as it did when you looked at its accuracy.
·
When it claims an image represents a 5, it is correct only 72.9% of the
time.
·
Moreover, it only detects 75.6% of the 5s.
•
It is often convenient to combine
ü precision and
ü recall into a single metric called the F1 score,
ü in particular if you need a
simple way to compare two classifiers.
•
The F1 score is
the harmonic mean of precision and recall.
F1Score
•
Whereas the regular mean treats all values equally,
ü the harmonic mean gives much more weight
to low values.
•
As a result, the classifier will only get a high F1 score if
ü both recall and precision
are high.
•
To compute the F1 score,
simply call the f1_score() function:
from sklearn.metrics
import f1_score
f1_score(y_train_5, y_train_pred)
0.7420962043663375
·
The F1 score favors
classifiers that have similar
ü precision and
ü recall.
·
This is not always what you want:
ü in some contexts you mostly
care about
precision,
and
ü in other contexts you
really care
about recall.
·
For example, if you trained a classifier to detect
videos that are safe for kids,
ü you
would probably prefer a classifier that rejects many good videos (low
recall)
ü but keeps only
safe ones (high precision),
ü rather
than a classifier that has a much higher recall but lets a few really bad videos show
up in your product.
·
On
the contrary, however suppose
· you train a classifier
to detect shoplifters in surveillance images:
ü it is
probably fine
if your classifier has only 30% precision as long as it has 99% recall
ü the security guards will
get a few false alerts,
ü but
almost all shoplifters will
get caught.
·
Unfortunately, you can’t have it both ways:
i.
increasing precision reduces
recall, and
ii.
vice versa.
ü
This is called the precision/recall trade-off.
About Machine Learning
Welcome! Your Hub for AI, Machine Learning, and Emerging Technologies In today’s rapidly evolving tech landscape, staying updated with the ...
-
VERSION SPACES AND THE CANDIDATE-ELIMINATION ALGORITHM • The key idea in the CANDIDATE-ELIMINATION algorithm is to output ...
-
UNIT 3 Support Vector Machines MCQs -------------------------------------------------------------------------------------------------------...
-
Machine Learning MCQs- 2 Performance Metrics, Linear Regression, Naïve Bayes Classifier 1. The greater the...




