Skip to content

Conversation

juliangruber
Copy link
Member

@juliangruber juliangruber commented Oct 31, 2024

No description provided.

@juliangruber juliangruber requested a review from bajtos October 31, 2024 16:12
|> range(start: 0)
|> filter(fn: (r) => r["_measurement"] == "retrieval_stats_honest")
|> filter(fn: (r) => strings.hasPrefix(v: r._field, prefix: "result_rate_"))
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will likely produce slightly incorrect data.

My understanding:

  1. spark-evaluate creates a new set of result_rate_* datapoints every round. Each result_rate_* value is a fraction of how many measurements reported that result code.
  2. In your query, you are aggregating these per-round data rates to per-day rates by calculating the mean average of the fractions.

Consider the following case (simplified):

  1. On round 1, we collect 400k measurements, of which 40k report OK. result_rate_OK is 10%.
  2. On round 2, we collect 600k measurements, of which 50k report OK. result_rate_OK is 8.33%.

Mean average: (10+8.33)/2 = 9.165%
Precise value: (40k+50k)/(400k+600k) = 9%

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What we are doing in other places:

  • In spark-evaluate, we keep track of both per-result count and total count
  • In spark-stats, we calculate the % rate from those numbers

For example: per-miner OK results:

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, the mean over relative values will be off with different measurement count. Since measurement count is constant-ish now, I think this doesn't matter a lot. But it should still be improved.

Do you have a suggestion how to solve this with Influx queries? I assume we'd need to join over the measurement count.

I have added a comment in 818285d, are you ok shipping like this?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have a suggestion how to solve this with Influx queries? I assume we'd need to join over the measurement count.

I don't think you can solve this at the query level. Maybe you could write a query that multiplies each result_rate_XXX value with measurements (total count) to reconstruct the number of measurements that reported the result code XXX.

IMO, result_rate_XXX points are not the right data source for this feature. We can keep pushing in this direction to see if we can find a fix. It may be better to step back and add a new stats writer to spark-evaluate, as we already do for other stats.

I have added a comment in 818285d, are you ok shipping like this?

What are the ramifications of shipping like this:

  • The chart will show incorrect data. (Hopefully, the inaccuracy is too small for anybody to notice. But we don't know how bad it will be, do we?)
  • We are introducing a technical debt. In the worst-case scenario, we will need to remove the InfluxDB observer and implement something else. Since we are talking about ~200 LOC, I think this is not a big deal.

While I am not excited about shipping it like this, I think it's an acceptable trade-off to make. 👍🏻

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you can solve this at the query level. Maybe you could write a query that multiplies each result_rate_XXX value with measurements (total count) to reconstruct the number of measurements that reported the result code XXX.

Yes this is what I was thinking about. But this join is going to be expensive.

IMO, result_rate_XXX points are not the right data source for this feature. We can keep pushing in this direction to see if we can find a fix. It may be better to step back and add a new stats writer to spark-evaluate, as we already do for other stats.

Yeah post LabWeek we can work on the proper solution for these.

What are the ramifications of shipping like this:

  • The chart will show incorrect data. (Hopefully, the inaccuracy is too small for anybody to notice. But we don't know how bad it will be, do we?)

I propose let's first see how this looks in the dashboard and then decide if it's too wrong to ship.

  • We are introducing a technical debt. In the worst-case scenario, we will need to remove the InfluxDB observer and implement something else. Since we are talking about ~200 LOC, I think this is not a big deal.

The worst case is one revert away so I think this is ok. Also, from a business perspective, if we're asked about why we didn't add this chart that was requested last LabWeek, technical debt is not a good answer. Do you have a realistic alternative proposal?

While I am not excited about shipping it like this, I think it's an acceptable trade-off to make. 👍🏻

I'm at the moment not operating at the level of excitement :D I just want to ship (obviously without shipping something too bad) so we have more useful things to show.

@juliangruber juliangruber requested a review from bajtos November 1, 2024 10:23
@juliangruber juliangruber changed the title Add daily retrieval result status Add daily retrieval result codes Nov 1, 2024
@juliangruber juliangruber merged commit 1624932 into main Nov 1, 2024
10 checks passed
@juliangruber juliangruber deleted the add/retrieval-result-status branch November 1, 2024 11:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants