Skip to content

arc_summary shows negative value for MFU data target #17210

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shodanshok opened this issue Apr 3, 2025 · 4 comments · Fixed by #17255
Closed

arc_summary shows negative value for MFU data target #17210

shodanshok opened this issue Apr 3, 2025 · 4 comments · Fixed by #17255
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@shodanshok
Copy link
Contributor

shodanshok commented Apr 3, 2025

System information

Type Version/Name
Distribution Name Rocky Linux
Distribution Version 9.5
Kernel Version 5.14.0-503.16.1.el9_5.x86_64
Architecture x86_64
OpenZFS Version 2.2.7-1

Describe the problem you're observing

arc_summary shows negative value for MFU data target:

ARC size (current):                                    99.5 %   15.3 GiB
        Target size (adaptive):                       100.0 %   15.4 GiB
        Min size (hard limit):                          6.2 %  984.0 MiB
        Max size (high water):                           16:1   15.4 GiB
        Anonymous data size:                            0.0 %    0 Bytes
        Anonymous metadata size:                      < 0.1 %  768.0 KiB
        MFU data target:                     -77.6 %  -10898862366 Bytes
        MFU data size:                                  0.2 %   30.5 MiB

Poking inside arc_summary itself, I can see the following lines:

zfs/cmd/arc_summary

Lines 626 to 627 in 7be9fa2

s = 4294967296
v = (s-int(pd))*(s-int(meta))/s

As on this machine pd is currently equal to 20883106811, v becomes negative.

Describe how to reproduce the problem

Not sure. It seems related to prefetch, but what does s means in this context? ARC states balance, related to the fixed-point arithmetic. From my understanding, pd or pm should never be bigger than 4294967296

Include any warning/errors/backtraces from the system logs

None.

@shodanshok shodanshok added the Type: Defect Incorrect behavior (e.g. crash, hang) label Apr 3, 2025
@shodanshok
Copy link
Contributor Author

It seems the issue is not only a cosmetic one. From what I see inside arc_evict_adj function in arc.c, s is capped at 32 bit but the returned fract value is not limited to the same maximum:

zfs/module/zfs/arc.c

Lines 4224 to 4256 in ba03054

static uint64_t
arc_evict_adj(uint64_t frac, uint64_t total, uint64_t up, uint64_t down,
uint_t balance)
{
if (total < 8 || up + down == 0)
return (frac);
/*
* We should not have more ghost hits than ghost size, but they
* may get close. Restrict maximum adjustment in that case.
*/
if (up + down >= total / 4) {
uint64_t scale = (up + down) / (total / 8);
up /= scale;
down /= scale;
}
/* Get maximal dynamic range by choosing optimal shifts. */
int s = highbit64(total);
s = MIN(64 - s, 32);
uint64_t ofrac = (1ULL << 32) - frac;
if (frac >= 4 * ofrac)
up /= frac / (2 * ofrac + 1);
up = (up << s) / (total >> (32 - s));
if (ofrac >= 4 * frac)
down /= ofrac / (2 * frac + 1);
down = (down << s) / (total >> (32 - s));
down = down * 100 / balance;
return (frac + up - down);
}

@amotin can this skew ARC MFU/MRU adjustments?

@amotin
Copy link
Member

amotin commented Apr 18, 2025

@shodanshok I don't understand your question. s is there just to extend the dynamic range of the deletion. Would we have 128-bit integer types to safely do (up << 32) / total without overflows, we would not need it. It should have no effects outside of those two division lines.

But it is true that frac and arc_evict_adj() return should never end up outside of [0, 1<<32] range. And after staring at that code for an hour I think I see how it can happen in some very specific circumstances. A patch is coming.

amotin added a commit to amotin/zfs that referenced this issue Apr 18, 2025
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes openzfs#17210
amotin added a commit to amotin/zfs that referenced this issue Apr 18, 2025
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes openzfs#17210
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
@shodanshok
Copy link
Contributor Author

@shodanshok I don't understand your question.

I wonder if this overflow can negatively alter MRU / MFU balancing.

But it is true that frac and arc_evict_adj() return should never end up outside of [0, 1<<32] range. And after staring at that code for an hour I think I see how it can happen in some very specific circumstances.

Can you elaborate on which circumstances are needed? I only noticed it on a backup box running ZFS 2.2.7 (with 2.1.x it never happened).

A patch is coming.

Thanks.

@amotin
Copy link
Member

amotin commented Apr 18, 2025

I wonder if this overflow can negatively alter MRU / MFU balancing.

Likely so. Up to total eviction of one of them.

Can you elaborate on which circumstances are needed? I only noticed it on a backup box running ZFS 2.2.7 (with 2.1.x it never happened).

The fraction affected should be between 1/5 and 1/4 or 3/4 and 4/5 in respective direction, and the amount of ghost hits in that direction should be less than 1/4 of total, but bigger than left till fraction overflow in that direction (down to 1/5).

amotin added a commit to amotin/zfs that referenced this issue Apr 23, 2025
With certain combinations of target ARC states balance and ghost
hit rates it was possible to get the fractions outside of allowed
range.  This patch limits maximum balance adjustment speed, which
should make it impossible, and also asserts it.

Fixes openzfs#17210
Signed-off-by:	Alexander Motin <mav@FreeBSD.org>
Sponsored by:	iXsystems, Inc.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants