Adaptive Label Thresholding

Arun S V
4 min readAug 21, 2023
Photo by Adi Goldstein on Unsplash

In my previous article, I discussed performance metrics used to evaluate the classifier and explored methods for its improvement. If you haven’t read the previous article, you can find it here for reference. This article explores a potential approach to enhancing classifier performance in multi-label classification tasks.

Introduction

A multi-label classification problem is characterized by the assignment of multiple labels to a single record or sample. The classifier is trained on a dataset consisting of records with multiple true labels, aiming to predict the probability values for all labels for each record. Then, the labels with probabilities greater than a specified threshold are filtered, and the prediction results are obtained. This threshold is global, meaning it is the same for all labels. While using a global threshold is not inherently bad, it may not be the optimal choice for all the labels, just as one size doesn’t fit all.

The goal in any classification problem to achieve high precision and recall scores, which are commonly combined as F1 score. For example, if a certain label (let’s say ‘A’) has a high precision score but has a low recall, decreasing the threshold might improve the recall without significantly impacting precision. Conversely, if another label (let’s say ‘B’) has a low precision but a high recall, increasing the threshold might improve the precision score without significantly impacting recall. Utilizing varying thresholds for the labels could be beneficial in this scenario.

Adaptive Label Thresholding

Selecting the optimal threshold for each individual label and using it to filter predicted probability values to obtain the final results is known as adaptive label thresholding.

To understand the concept of adaptive thresholding let’s first take a sample multi-label classification problem and the results generated by some classifier. Consider, a multi-label classification dataset with target labels ‘A’, ‘B’, ‘C’, ‘D’.

Suppose the dataset comprises a total of 1,000 records. The distribution of records for each class is as follows:

  1. ‘A’ → 473 records
  2. ‘B’ → 628 records
  3. ‘C’ → 122 records
  4. ‘D’ → 759 records

For example, let’s take the label ‘A’, which corresponds to a total of 473 records. The classifier outputs the predicted probabilities for all the records associated with label ‘A’. Next, we apply a filtering process using a threshold, such as 0.87. This entails assigning records with a predicted probability for label ‘A’ greater than 0.87 as ‘A’, while records with lower probabilities do not receive that label.

We calculate the precision and recall values for label ‘A’, resulting in a high precision of 0.9 but a significantly low recall of 0.4. Upon analyzing the predictions of the classifier, we discover that many positive records were classified with a confidence of 0.8. However, since our threshold is significantly higher than this value, our recall is very low.

We do the same for label ‘B’ which corresponds to a total of 473 records and we arrive at a low precision of 0.38 and a high recall of 0.83 for the same threshold. Again, upon analyzing the predictions of the classifier, we discover most almost all positive records are classified with a confidence of 0.92. However, since our threshold is a little lower, the many negative records were also getting assigned the label ‘B’.

Now, with us knowing the reason for problem is that we were using a global threshold to filter the predictions, we can utilize adaptive thresholding technique to solve this. So, how do we do it?

Finding Optimal Thresholds

Before we delve into discovering the optimal thresholds, let us try to visualize a probability distribution for a specific label between positive and negative records. Below, you’ll find a simple illustration in the form of a typical histogram.

Probability Distribution plot for +ve & -ve records of a label

Here, just by looking at the plot, we can intutively say that the optimal threshold for this particular label would approximately lie at ‘0.5’. Using this threshold to filter the predictions, maximum of the positive and negative records will be classified correctly.

Great, we now know what is optimal threshold a way to find it manually. But, imagine having a dataset with large number of target labels. It becomes harder to manually visualize the predictions and finding the optimal thresholds for each label. But, I got you covered.

Above is a function to find the optimum threshold given the positive and negative predictions of a label. Now, we have way to find the correct threshold to filter the predictions for all labels without any hassle and thus we can improve our model’s performance significantly.

--

--