Skip to content

Metrics and Terminology

Simone Maurizio La Cava edited this page Feb 1, 2022 · 9 revisions

Performance analysis

The evaluation of a biometric matcher and a PAD (presentation attack detector), as well as their fusion, is based on a series of metrics that define their effectiveness from different points of view. In particular, the simulator considers their Receiver Operating Characteristics (ROC) curves, and the metrics behind them, in order to evaluate them in a quantitative way. A ROC (Receiver Operating Characteristic) curve is a graphical plot that illustrates the discriminative capability of a classification system at various threshold settings. The simulator considers the ROC curves related to the True Positive Rate (TPR) values at different False Positive Rate (FPR) values, computed through the same decision thresholds.

The FPR is represented by different names, depending on the considered system (matcher, PAD, or the integration between them), but always represents the proportion of erroneous attempts (either malicious or not) incorrectly accepted by the system. For example, how many times fingerprints which do not belong to the aimed identity (i.e., impostor attempts) are attributed to such ID in a fingerprint verification system, or how many times a PAD considered fake fingerprints as live ones.

Similarly, the TPR represents the proportion of genuine (i.e., correct) attempts that are accepted by the system with respect to the total number of genuine attempts.

In order to ease the comprehension, let's analyze the following example. Here, we evaluate FPR and TPR at different thresholds (i.e., 0, 0.1, 0.5, and 0.9), taking into account that the score estimated for the single sample must be strictly higher than such value in order to be considered as positive (predicted ID=1). Therefore, we plot these values on a Cartesian plane, where the x-axis represents the FPR values while the y-axis represents the GPR ones, in order to show the resulting ROC curve.

From the FPR and the TPR it is possible to compute other two metrics, namely True Negative Rate (TNR) and False Negative Rate (FNR), which assess the proportion of negative (impostor) samples correctly classified as negative and the proportion of positive samples wrongly classified as negative, respectively:

Another metric that characterizes a system is the Equal Error Rate (EER), representing the point where the FPR is equal (or most similar) to the FNR. When it is possible to find a threshold value for which FPR=FNR, then the EER is equal to their values, otherwise, it is possible to compute it by considering the threshold (thr) at which their value is most similar:

Recalling the relationship FNR = 1 - TPR, it is straightforward to compute such value in the previous example:

Usually, the system is also evaluated in terms of Confusion Matrix, highlighting the performance of the system at a fixed threshold in terms of correctly and wrongly classified samples of the various classes or in terms of proportions representing them. The webapp provides the Confusion Matrix obtained at the threshold value on which the EER is computed, following the latter representation:

Once the ROC curve is obtained, the system can also be characterized as the area under such curve (AUC), since the higher such value, having a range between 0 and 1, the better tends to be the system. However, note that even a system having a lower AUC value than a second one can be better than the latter because it responds better in the desired scenario.

These metrics have different names based on the considered system (matcher, PAD, or the fusion between them) and on the different property which is analyzed. Therefore, in the following paragraphs, we introduce them by following the ISO terminology.

Matcher-related terminology

Also known as False Acceptance Rate (FAR), the FPR takes the name of False Match Rate (FMR) when we consider the biometric matcher (e.g., a verification system). Therefore, this metric represents the proportion of impostor attempts that are falsely declared to match an identity that is not the correct one.

Also known as False Rejection Rate (FRR), the FNR takes the name of False Non-Match Rate (FNMR) when we consider the biometric matcher. Therefore, this metric represents the proportion of genuine attempts that are falsely declared not to match its own identity.

The TPR takes instead the name of Genuine Acceptance Rate (GAR). This metric has a similar relationship with the FNMR, corresponding to the FNR, as well:

When considering the performance of a matcher with respect to presentation attacks, it is also useful to consider the Impostor Attack Presentation Match Rate (IAPMR), also known as Spoof-False Acceptance Rate (SFAR). This metric represents the proportion of impostor presentation attacks using the same PAI (Presentation Attack Instrument) species in which the target reference is matched.

PAD-related terminology

A PAD considers the live samples, or bona fide presentations, as positive (labelled as 1 in the webapp), while the fake samples as negative ones (labelled as 0).

Therefore, the FPR takes the name of Attack Presentation Classification Error Rate (APCER), representing the proportion of attack presentations using the same PAI species incorrectly classified as bona fide presentations in a specific scenario.

The FNR takes the name of Bona fide Presentation Classification Error Rate (BPCER) instead, representing the proportion of bona fide presentations incorrectly classified as attack presentations in a specific scenario.

Integration performance

The system composed of the biometric matcher with embedded PAD can be characterized by the same metrics. In particular, its acceptance rates can be highlighted:

  • the genuine users one (GAR)
  • the zero-effort attacks one (FMR)
  • the presentation attacks one (IAPMR)

As explained in detail in the Rationale section of the wiki, is a boolean event which define if the access is granted to a certain user when the matching score between the input image and the user’s claimed identity templates is over a given acceptance threshold , while is another boolean event which defines if the liveness detector gives the classification of a certain input sample as alive when the liveness score , obtained by the analysis of the feature set extracted from the input image, is over a certain liveness threshold . Therefore, the M event is related to the fixed matching threshold (e.g., GAR(M) is the Genuine Acceptance Rate at such threshold value), while the F event is related to the chosen scenario set when using the simulator in terms of fixed PAD's operational point (APCER ≤ p% or BPCER ≤ p%). Note that the choice of PAD's operational point is always related to both APCER and BPCER since, once one among the two scenarios is set, the other parameter is computed accordingly (e.g., if APCER ≤ p% is chosen, then BPCER is obtained as the value shown at the same threshold which identify APCER ≤ p%), as explained in the Webapp section of the wiki.

Moreover, the system introduces the Global False Match Rate (GFMR), representing the final acceptance rate of the integrated system, taking into account the probability of a presentation attack (w in our webapp, which denotes the relative cost of presentation attacks with respect to zero-effort impostors).

Go to the Wiki for further information about the simulator and the webapp.

Clone this wiki locally