-
Notifications
You must be signed in to change notification settings - Fork 4
Add daily retrieval result codes #244
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 8 commits
e0b88d6
5344a52
5912932
18763e9
f2a7587
caf8af7
f75b30b
818285d
c62756a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,7 @@ | ||
CREATE TABLE daily_retrieval_result_codes ( | ||
day DATE NOT NULL, | ||
code TEXT NOT NULL, | ||
rate NUMERIC NOT NULL, | ||
PRIMARY KEY (day, code) | ||
); | ||
CREATE INDEX daily_retrieval_result_codes_to_day ON daily_retrieval_result_codes (day); |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -96,3 +96,34 @@ export const observeScheduledRewards = async (pgPools, ieContract, fetch = globa | |
function isEventLog (logOrEventLog) { | ||
return 'args' in logOrEventLog | ||
} | ||
|
||
export const observeRetrievalResultCodes = async (pgPoolStats, influxQueryApi) => { | ||
// TODO: The `mean` aggregation will produce slightly wrong numbers, since | ||
// the query is aggregating over relative numbers - with varying measurement | ||
// counts, the relative numbers should be weighted differently. Since the | ||
// measurement count per round should be relatively stable, this should be | ||
// good enough for now. Please pick up and improve this. | ||
// Ref: https://github.com/filecoin-station/spark-stats/pull/244#discussion_r1824808007 | ||
const rows = await influxQueryApi.collectRows(` | ||
import "strings" | ||
from(bucket: "spark-evaluate") | ||
|> range(start: 0) | ||
juliangruber marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|> filter(fn: (r) => r["_measurement"] == "retrieval_stats_honest") | ||
|> filter(fn: (r) => strings.hasPrefix(v: r._field, prefix: "result_rate_")) | ||
|> aggregateWindow(every: 1d, fn: mean, createEmpty: false) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this will likely produce slightly incorrect data. My understanding:
Consider the following case (simplified):
Mean average: (10+8.33)/2 = 9.165% There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. What we are doing in other places:
For example: per-miner OK results: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good point, the mean over relative values will be off with different measurement count. Since measurement count is constant-ish now, I think this doesn't matter a lot. But it should still be improved. Do you have a suggestion how to solve this with Influx queries? I assume we'd need to join over the measurement count. I have added a comment in 818285d, are you ok shipping like this? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I don't think you can solve this at the query level. Maybe you could write a query that multiplies each IMO,
What are the ramifications of shipping like this:
While I am not excited about shipping it like this, I think it's an acceptable trade-off to make. 👍🏻 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes this is what I was thinking about. But this join is going to be expensive.
Yeah post LabWeek we can work on the proper solution for these.
I propose let's first see how this looks in the dashboard and then decide if it's too wrong to ship.
The worst case is one revert away so I think this is ok. Also, from a business perspective, if we're asked about why we didn't add this chart that was requested last LabWeek, technical debt is not a good answer. Do you have a realistic alternative proposal?
I'm at the moment not operating at the level of excitement :D I just want to ship (obviously without shipping something too bad) so we have more useful things to show. |
||
|> keep(columns: ["_value", "_time", "_field"]) | ||
|> map(fn: (r) => ({ r with _field: strings.replace(v: r._field, t: "result_rate_", u: "", i: 1) })) | ||
`) | ||
for (const row of rows) { | ||
await pgPoolStats.query(` | ||
INSERT INTO daily_retrieval_result_codes | ||
(day, code, rate) | ||
VALUES ($1, $2, $3) | ||
ON CONFLICT (day, code) DO UPDATE SET rate = EXCLUDED.rate | ||
`, [ | ||
row._time, | ||
row._field, | ||
row._value | ||
]) | ||
} | ||
} |
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Uh oh!
There was an error while loading. Please reload this page.