Accuracy, Precision, Sensitivity, Specificity, and F1

Submitted by Anonymous (not verified) on Sat, 07/23/2022 - 05:11

Confusion matrices are used for reporting performance of a model.

Important terminology to remember:

False positive: Obervation falsely classified as positive. This is akin to recieving a positive lab test for a patient that doesn't actually have the thing being tested for.
True positive: Observationscorrectly classified as positive, similar to a positive lab test for a patient really is positive.
True negative: Observation correctly classified as negative, similar to a correct, negative lab test.
False negative: Observation falsely classified as negative, like an incorrect negative lab test.

A test that would catch all the True positives (having few or no False negatives) would be considered very sensitive. Sensitivity is essential, for example, when needing to treat all patients with a severe disease, given the treatment is accessible to all and has no side effects. How to catch every True positive and have no False negatives? Just call every patient positive, resulting in zero (0) True Negatives and zero (0) False Negatives with maximum False Positives and maximum True positives. This only works if the treatment truly is accessible to all and has no side effects, but it demonstrates the need for balance.

Other. related terminology in machine learning to help with this understanding:

Accuracy: Accuracy is defined as the ratio of the number of correct predictions to the total number of predictions. Intuitively, this is:
- the number of patients that are correctly diagnosed out of the entire patient population.
Precision: (a.k.a., positive predictive value, or PPV) fraction of relevant instances among the retrieved instances. In other words,
- the fraction of sick people that are correctly diagnosed.
Sensitivity: (a.k.a., Recall) the fraction of relevant instances in the population that are "sensed", or "recalled" in the retrieved instances. 100% sensitivity would not misdiagnose any sick patients. That is,
- the fraction of sick patients correcty diagnosed as sick.
Specificity: The fraction of true positives correctly predicted. 100% specificity means that no patient would be misdiagnosed with the disease. For example,
- the fraction of people diagnosed as sick that are actually sick.

Ideally, a model should optimize both Precision and Recall.

Mathematically:

F1-score (a.k.a. F-Score/F-Measure): measure the balance between Precision and Recall. Higher is better; this measure is better to use than accuracy when the false negative and false positive counts are similar, and they have similar cost.
- F1 Score = 2*(Recall * Precision) / (Recall + Precision)
Other equations:
- Accuracy = (TP+TN)/(TP+FP+FN+TN)
- Precision = TP/(TP+FP)
- Recall (a.k.a. Sensitivity)= TP/(TP+FN)
- Specifity = TN/(TN+FP)

Below are some great resources for better explaining how this applies to multi-class classification

This figure is also great

Great resources:

Confusion Matrix for Multi-Class Classification - Analytics Vidhya
- more about multi-class vs binary
Precision, Recall, Sensitivity and Specificity (opengenus.org)
- great figures, see above
What is the difference between Precision, Specificity and Accuracy? - Quora
- wonderful resource for explaining when to use what.

Accuracy, Precision, Sensitivity, Specificity, and F1

Tags

Related articles