Skip to content

Consumers tab shows wrong lag value when messages are produced transactionally #1039

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
4 tasks done
krumft opened this issue Apr 24, 2025 · 10 comments
Open
4 tasks done
Labels
area/consumers area/messages status/triage/completed Automatic triage completed status/triage/manual Manual triage in progress type/bug Something isn't working

Comments

@krumft
Copy link

krumft commented Apr 24, 2025

Issue submitter TODO list

  • I've looked up my issue in FAQ
  • I've searched for an already existing issues here
  • I've tried running main-labeled docker image and the issue still persists there
  • I'm running a supported version of the application which is listed here

Describe the bug (actual behavior)

Hi Team,

Congrats for the great tool!

Not sure if what I am going to describe should be considered as a feature request or a bug. Happy to have this moved to a feature request.

We are observing wrong data being displayed in the Consumer Lag column in the Consumers tab, when messages are being produced transactionally.

The Kafka producer produces a special end-of-transaction message which marks the end of transaction offset. This message is not visible in the messages list and it is not consumed by consumers but it is counted in the lag.

This wrong lag will stay forever or until a producer produces a message without using a transaction.

All my best,
Krum.

Expected behavior

The Consumer Lag column seen in the Consumers tab shows 0, if the consumer has processed all messages on its topic of interest.

Your installation details

App version we are using is v1.2.0. We are deploying its Docker image and running it as a container on AWS ECS.

Steps to reproduce

Target Kafka UI against a Kafka cluster whose version supports transactional messages.

Have a Kafka topic where some producer produces messages transactionally. Use X as a number of partitions for that topic.

Have the producers produce messages in a transaction. Make sure that consumers consume these messages with no issues.

Observe that the Consumer Lag column seen in the Consumers tab consistently shows values bigger than 0. The value could be as big as X, it could also drop to 0.

Screenshots

No response

Logs

No response

Additional context

No response

@krumft krumft added status/triage Issues pending maintainers triage type/bug Something isn't working labels Apr 24, 2025
@kapybro kapybro bot added area/consumers area/messages status/triage/manual Manual triage in progress status/triage/completed Automatic triage completed and removed status/triage Issues pending maintainers triage labels Apr 24, 2025
Copy link

Hi krumft! 👋

Welcome, and thank you for opening your first issue in the repo!

Please wait for triaging by our maintainers.

As development is carried out in our spare time, you can support us by sponsoring our activities or even funding the development of specific issues.
Sponsorship link

If you plan to raise a PR for this issue, please take a look at our contributing guide.

@Masqueey
Copy link

I am not entirely certain that this is a bug, but more Kafka as intended. Although the consumer ignores the messages because it respects the transactional status, the messages are actually there and the consumer has not committed having read them: thus there is consumer lag.

Upon reread, is it just the transaction closing message that is adding to the lag? What happens when you start a new transaction? Will it step over the "missed" message?

@krumft
Copy link
Author

krumft commented Apr 24, 2025

I am not entirely certain that this is a bug, but more Kafka as intended. Although the consumer ignores the messages because it respects the transactional status, the messages are actually there and the consumer has not committed having read them: thus there is consumer lag.

Yes, I agree with you.

Upon reread, is it just the transaction closing message that is adding to the lag?

Yes, the marker message is the culprit, leading to lag of 1. That happens for each partition on which you have a transaction.

What happens when you start a new transaction? Will it step over the "missed" message?

Yes, kind of :)

It is fine for you to ignore this one, it is hardly a real bug. It more looks like a challenge of how to represent a meaningful UX/UI for these little transactional quirks. Maybe the best solution is the simplest: do nothing.

@krumft
Copy link
Author

krumft commented Apr 24, 2025

From the other perspective, if the tool behaved consistently, it should be possible to see the marker messages in the Messages tab. This is currently not possible. For example, here is what I see in the lag details: the consumer lags on partition 95 of the topic, having read and committed message 6198 while the latest offset is 6199.

At the same time, looking at Messages the latest offset I can see is 6197.

Image Image

@Masqueey
Copy link

Yeah, I see what you mean. We have a similar issue with a producer always opening a transaction "just in case" it wants to produce at some point and then that one meta-message showing as lag in our metrics/the messages tab.

But I can see how this might be a good feature request. Maybe something like "transaction opening marker" and "closing marker". (Perhaps something you could toggle on or off.)

@krumft
Copy link
Author

krumft commented Apr 24, 2025

Yeah, as transactions are implemented as some protocols/abstractions on top of messages, offsets, etc., it becomes a challenge for the tool to decide whether to show the raw artifacts, or to respect the higher-level abstractions.

@germanosin
Copy link
Member

@krumft, thanks for raising the issue. @Masqueey, appreciate the clarification. Yes, you're absolutely right — there are "system" messages within the topic that are hidden in the UI because the consumer doesn’t display them. Additionally, the consumer only commits offsets for regular messages. This could definitely be confusing for users who aren't aware of this internal behavior.

From a UI perspective, what do you think would be the best approach to address or clarify this?

@krumft
Copy link
Author

krumft commented Apr 25, 2025

This is a hard question :)

Perhaps my personal choice would go the way suggested by @Masqueey : just like we already have a Show Internal Topics toggle, we could probably have a similar toggle for showing/hiding transactional messages. The challenge here is that this new toggle affects the UI of at least two different pages: Topics (as in, the messages on a given topic) and Consumers (the lag for a given consumer group). So it must be visible/accessible from both of these pages.

It would also be nice to check how the competition is approaching this challenge.

Thanks so much for the discussion.

@germanosin
Copy link
Member

@krumft From what I've seen, competitors haven't addressed this issue either — so we might be the first to solve it!

Here's what's on my mind:

In the Consumer Lag tab, we could add a checkbox to exclude transactional messages. If selected, we’d need to filter message types accordingly during the offset request.

For the Messages tab, I suggest we leave it as is, since there’s no API available to fetch those messages via the Consumer API.

@krumft
Copy link
Author

krumft commented Apr 25, 2025

From what I've seen, competitors haven't addressed this issue either — so we might be the first to solve it!

Great, thanks for checking.

In the Consumer Lag tab, we could add a checkbox to exclude transactional messages. If selected, we’d need to filter message types accordingly during the offset request.

Makes sense to me.

For the Messages tab, I suggest we leave it as is, since there’s no API available to fetch those messages via the Consumer API.

Oh, I see. So in this case we're quite restricted in what we could do here. This is a strange asymmetry in Kafka's own APIs: you cannot consume control messages, yet they are part of the reported lag. Would it make sense to ping the Kafka team about that?

To apply some common sense and critical thinking: by implementing this feature are we sure we're not interfering/coupling the tool too much with some internal Kafka decisions that might change in some later version? The decision to implement transactions with control messages is an internal decision of Kafka, and that could change at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/consumers area/messages status/triage/completed Automatic triage completed status/triage/manual Manual triage in progress type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants