Unexpected results in subset-based alignment and alignment against reference #795

jorainer · 2025-05-14T06:05:23Z

For some data sets we see a strange artifact when using subset-based alignment:

i.e., instead of the expected continuous, smooth line for each sample we see such random scattering for the retention time differences (rtadj - rtraw) for the samples that get aligned against the subset (smooth solid lines in the plot).

This seems to be specific to certain data sets as we don't see this e.g. on the data set used in the package vignette.

In subset-based alignment we use the internal .applyRtAdjustment() function that aligns a vector of values against reference values:

.applyRtAdjustment <- function(x, rtraw, rtadj) {
    ## re-order everything if rtraw is not sorted; issue #146
    if (is.unsorted(rtraw)) {
        idx <- order(rtraw)
        rtraw <- rtraw[idx]
        rtadj <- rtadj[idx]
    }
    adjFun <- stepfun(rtraw[-1] - diff(rtraw) / 2, rtadj)
    res <- adjFun(x)
    ## Fix margins.
    idx_low <- which(x < rtraw[1])
    if (length(idx_low)) {
        first_adj <- idx_low[length(idx_low)] + 1
        res[idx_low] <- x[idx_low] + res[first_adj] - x[first_adj]
    }
    idx_high <- which(x > rtraw[length(rtraw)])
    if (length(idx_high)) {
        last_adj <- idx_high[1] - 1
        res[idx_high] <- x[idx_high] + res[last_adj] - x[last_adj]
    }
    if (is.null(dim(res)))
        names(res) <- names(x)
    res
}

My suspicion is the use of stepfun() here. I will check if we could not maybe use approxfun() with linear interpolation instead.

The text was updated successfully, but these errors were encountered:

jorainer · 2025-05-14T06:10:52Z

I implemented an alternative version that allows to switch the interpolation method:

.applyRtAdjustment <- function(x, rtraw, rtadj,
                               method = c("stepfun", "approxfun")) {
    method <- match.arg(method)
    ## re-order everything if rtraw is not sorted; issue #146
    if (is.unsorted(rtraw)) {
        idx <- order(rtraw)
        rtraw <- rtraw[idx]
        rtadj <- rtadj[idx]
    }
    if (method == "approxfun")
        adjFun <- approxfun(rtraw, rtadj)
    else
        adjFun <- stepfun(rtraw[-1] - diff(rtraw) / 2, rtadj)
    res <- adjFun(x)
    ## Fix margins.
    idx_low <- which(x < rtraw[1])
    if (length(idx_low)) {
        first_adj <- idx_low[length(idx_low)] + 1
        res[idx_low] <- x[idx_low] + res[first_adj] - x[first_adj]
    }
    idx_high <- which(x > rtraw[length(rtraw)])
    if (length(idx_high)) {
        last_adj <- idx_high[1] - 1
        res[idx_high] <- x[idx_high] + res[last_adj] - x[last_adj]
    }
    if (is.null(dim(res)))
        names(res) <- names(x)
    res
}

jorainer · 2025-05-14T06:38:15Z

I could not reproduce the issue above using artificial data - but it seems to me the stepfun()-based approach has some issues:

Simulating just a linear shift of values by a constant:

a_raw <- seq(1, 100, by = 0.3)
a_adj <- a_raw + 0.2

Defining a vector that should be aligned based on the above alignment results. Note that the increment is the same (0.5), but the starting values and all values are not the same (i.e. none of the values in x is part of a_raw).

x <- seq(1.4, 20, by = 0.3)

Running the original method:

res <- .applyRtAdjustment(x, a_raw, a_adj, method = "stepfun")
head(x)
[1] 1.4 1.7 2.0 2.3 2.6 2.9
head(res)
[1] 1.5 1.8 2.1 2.4 2.7 3.0

The difference between the raw and adjusted values is 0.1, although we would expect 0.2 (as we shifted the values by that constant).

Using approxfun() instead:

res <- .applyRtAdjustment(x, a_raw, a_adj, method = "approxfun")
head(res)
[1] 1.6 1.9 2.2 2.5 2.8 3.1

And also expect_equal(res, x + 0.2) does not throw an error. So, approxfun() correctly estimates the drift and adjusts the second vector accordingly.

A second example adding also noise - but still using a constant shift by 0.2:

a_raw <- seq(0.1, 100, by = 0.03)
a_adj <- sort(a_raw + 0.2)
set.seed(123)
x <- seq(2.02, 90.02, by = 0.03)
x <- sort(x + rnorm(length(x), 0, 0.001))

Running stepfun():

res <- .applyRtAdjustment(x, a_raw, a_adj, method = "stepfun")
expect_equal(res, x + 0.2)
Error: `res` not equal to x + 0.2.
2934/2934 mismatches (average diff: 0.00079)
[1] 2.22 - 2.22 ==  5.60e-04
[2] 2.25 - 2.25 ==  2.30e-04
[3] 2.28 - 2.28 == -1.56e-03
[4] 2.31 - 2.31 == -7.05e-05
[5] 2.34 - 2.34 == -1.29e-04
[6] 2.37 - 2.37 == -1.72e-03
[7] 2.40 - 2.40 == -4.61e-04
[8] 2.43 - 2.43 ==  1.27e-03
[9] 2.46 - 2.46 ==  6.87e-04
...

And plotting the data:

plot(res, res - x, type = "l")

So, seems we partially validate the observed scattering in the initial example.

While when using approxfun():

res <- .applyRtAdjustment(x, a_raw, a_adj, method = "approxfun")
expect_equal(res, x + 0.2)
head(x)
[1] 2.019440 2.049770 2.081559 2.110071 2.140129 2.171715
head(res)
[1] 2.219440 2.249770 2.281559 2.310071 2.340129 2.371715

For these example it seems the approxfun()-based approach better estimates and adjusts the difference.

- This fixes issue #795 by using `approxfun()` instead of `stepfun()` for the subset-based alignment (and only there!). For adjustment of chrom peaks' retention times we still use the `stepfun()` that will ensure the RT will coincide with the RT of an actual spectrum (not interpolated).

jorainer · 2025-05-14T11:04:43Z

The fix is in the above commit. It only affects subset-based adjustment. For any other retention time mapping it still uses the original stepfun()-based approach. Thus, the fix is expected to not introduce any additional/new features/issues.

Before making a PR I will cross-check the results on various data sets.

jorainer self-assigned this May 14, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected results in subset-based alignment and alignment against reference #795

Unexpected results in subset-based alignment and alignment against reference #795

jorainer commented May 14, 2025

jorainer commented May 14, 2025

Uh oh!

jorainer commented May 14, 2025

Uh oh!

jorainer commented May 14, 2025

Uh oh!

Unexpected results in subset-based alignment and alignment against reference #795

Unexpected results in subset-based alignment and alignment against reference #795

Comments

jorainer commented May 14, 2025

jorainer commented May 14, 2025

Uh oh!

jorainer commented May 14, 2025

Uh oh!

jorainer commented May 14, 2025

Uh oh!