-
Notifications
You must be signed in to change notification settings - Fork 1
feat: skip ads where entries cannot be retrieved #23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
While running the indexer in production, I noticed that many index providers return advertisemnt but then they return "404 cid not found" when we ask for the (payload) entries. Let's recover from that error by ignoring the advertisement and continuing the walk. Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
indexer/lib/advertisement-walker.js
Outdated
/** @type {any} */(err).serverMessage ?? '<not found>' | ||
) | ||
return { | ||
entriesFetchError: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not the best property name, can you suggest a better one? Or may I should set the value to a string instead of an error?
entriesFetchError: true, | |
entriesFetchError: String(err.statusCode), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just looking up this property, as was assuming it's one of these
- an
Error
- an array of entries for which there was a fetch error
- the count of entries for which there was a fetch error
What about entriesFetchError: err
? Or otherwise hasEntriesFetchError: true
indexer/lib/advertisement-walker.js
Outdated
/** @type {any} */(err).serverMessage ?? '<not found>' | ||
) | ||
return { | ||
entriesFetchError: true, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was just looking up this property, as was assuming it's one of these
- an
Error
- an array of entries for which there was a fetch error
- the count of entries for which there was a fetch error
What about entriesFetchError: err
? Or otherwise hasEntriesFetchError: true
indexer/lib/advertisement-walker.js
Outdated
|
||
try { | ||
const { previousAdvertisementCid, ...entry } = await fetchAdvertisedPayload( | ||
const { previousAdvertisementCid, entriesFetchError, ...entry } = await fetchAdvertisedPayload( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is entriesFetchError
plural and entry
singular?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me explain how this works:
- First, we fetch the advertisement object. This object contains a CID linking to another object containing (payload block) entries.
- Second, we fetch the object with the entries.
- Finally, we take the first entry and discard the rest.
So, the error happens when we are fetching entries, but then we return only a single entry.
indexer/lib/advertisement-walker.js
Outdated
err.statusCode, | ||
/** @type {any} */(err).serverMessage ?? '<not found>' | ||
) | ||
return { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are we returning an object with { error }
instead of throwing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this can be confusing, and I am thinking about redesigning my implementation.
What I wanted to accomplish in this PR:
- If the request for entries returns 404, we want to skip the advertisement and walk to the next advertisement in the chain. This is handled by returning
{ error }
- If the request for entries fails with a different error, we want to retry it. This is handled by (re)throwing the error.
After thinking about this some more, I am leaning towards the following design:
- If the request for entries returns 404, we skip the advertisement. (No change from the current proposal.)
- If the request for entries fails differently, we retry it several times. If all attempts fail, then we skip the advertisement, too.
That way, we optimise for finishing the walk of all advertisements so that we can process new advertisements and avoid repeating the same failing request forever.
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Co-authored-by: Julian Gruber <julian@juliangruber.com>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
@juliangruber I reworked my PR as follows: Inside
Inside |
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some suggestions but nothing blocking the PR from being merged
indexer/lib/advertisement-walker.js
Outdated
const indexEntry = entry.pieceCid ? entry : undefined | ||
if (error === 'CANNOT_FETCH_ENTRIES') { | ||
state.entriesNotRetrievable = (state.entriesNotRetrievable ?? 0) + 1 | ||
} else if (error === 'CANNOT_DETERMINE_PIECE_CID') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
} else if (error === 'CANNOT_DETERMINE_PIECE_CID') { | |
} else if (error === 'MISSING_PIECE_CID') { |
To match the state
property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what state
property do you refer to. Do you mean the text in the debug
logs?
I don't have a strong opinion about the error codes, I will apply your suggestions.
indexer/lib/advertisement-walker.js
Outdated
} | ||
|
||
const indexEntry = entry.pieceCid ? entry : undefined | ||
if (error === 'CANNOT_FETCH_ENTRIES') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (error === 'CANNOT_FETCH_ENTRIES') { | |
if (error === 'ENTRY_NOT_RETRIEVABLE') { |
To match the state
property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll use ENTRIES_NOT_RETRIEVABLE
. We cannot retrieve the list of entries. We don't try to retrieve the entry itself.
}} AdvertisedPayload */ | ||
/** | ||
* @param {unknown} err | ||
* @param {string} providerAddress |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
* @param {string} providerAddress | |
* @param {string} providerAddress | |
* @returns {string} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indexer/lib/advertisement-walker.js
Outdated
if (!pieceCid) { | ||
debug('advertisement %s has no PieceCID in metadata: %j', advertisementCid, meta.deal) | ||
return { | ||
error: /** @type {const} */('CANNOT_DETERMINE_PIECE_CID'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this type cast for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Without this cast, the error
would have the type string
and auto-completion in VS Code won't work.
With the cast, the error
has an enum type 'CANNOT_DETERMINE_PIECE_CID' | 'CANNOT_FETCH_ENTRIES'
.
Then, when you start writing a condition like if (error === '
, VSCode can auto-complete the valid values.
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>
While running the indexer in production, I noticed that many index providers return advertisemnt but then they return "404 cid not found" when we ask for the (payload) entries.
Let's recover from that error by ignoring the advertisement and continuing the walk.
Links: