For the classification metrics chapter (#83) we should update the calibration discussion so that we are not comparing un-paired intervals.