-
Notifications
You must be signed in to change notification settings - Fork 46
feat: sign record's public url #358
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`; | ||
} | ||
|
||
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}?signature=${createHmacSignature(this.storageObject.urlSigningSecretKea as string, key)}`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we pls compose the URL with URL.searchParams
and similar? Maybe concatting the path with path.posix.join()
too?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aside from ensuring the correct URL syntax, this procedural approach might also allow us to only specify the pathname once (and just conditionally add the query parameter if the secret exists).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, makes sense. Refactored it 🙃
@@ -19,7 +20,11 @@ export class KeyValueStore extends CoreKeyValueStore { | |||
return getPublicUrl.call(this, key); | |||
} | |||
|
|||
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`; | |||
if (!this.storageObject?.urlSigningSecretKea) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
urlSigningSecretKea
instead of ...key
might be a typo?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks!
(as long as we wait for the new stable Crawlee version, but I see you mentioned that in the PR description already 👍🏽 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good, except the red CI which is not related to your changes
This PR is part of [Issue #19363](apify/apify-core#19363), which updates `KeyValueStore.getPublicUrl(recordKey)` to generate signed links using HMAC. **This PR**: - creates signature and appends it to the public URL of the record in KV store P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee. P.P.S It's hard to test the changes, since master branch of SDK and master branch of Crawlee are out of sync. Crawlee has some breaking changes in version 6.0, which are not yet addressed in master branch of SDK. Previous PR that adds storageObject to Crawlee Python is [here](apify/crawlee-python#993) Same PR in SDK JS is [here](apify/apify-sdk-js#358)
@@ -19,7 +20,13 @@ export class KeyValueStore extends CoreKeyValueStore { | |||
return getPublicUrl.call(this, key); | |||
} | |||
|
|||
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`; | |||
const publicUrl = new URL(`https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(nit) Isn't the API base URL available in the env variables?
That way the Actor will create valid urls in all environments. You could leverage that to create an actual e2e test that not only creates & runs an Actor that produces the signed URL, but actually tests the URL against Apify API.
(The current test below basically just copies the implementation... you are calling the same code twice and checking that it produces the same value 😅 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think there was a reason for this. if you ask what reason, i have no idea, but there is something at the back of my mind telling me there was one 🙃
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@B4nan I think it should be safe to use config.get('apiPublicBaseUrl')
variable as it has fallback to https://api.apify.com
?
package.json
Outdated
@@ -67,7 +67,7 @@ | |||
"@typescript-eslint/eslint-plugin": "^7.0.0", | |||
"@typescript-eslint/parser": "^7.0.0", | |||
"commitlint": "^19.3.0", | |||
"crawlee": "^3.11.5", | |||
"crawlee": "^3.13", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep the full version definition here
packages/apify/package.json
Outdated
"@crawlee/core": "^3.12.3-beta.12", | ||
"@crawlee/types": "^3.12.3-beta.12", | ||
"@crawlee/utils": "^3.12.3-beta.12", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
and don't forget to bump this
packages/apify/package.json
Outdated
"@crawlee/core": "^3.12.3-beta.12", | ||
"@crawlee/types": "^3.12.3-beta.12", | ||
"@crawlee/utils": "^3.12.3-beta.12", | ||
"apify-client": "^2.11.1", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why the downgrade here? 👀
return getPublicUrl.call(this, key); | ||
} | ||
|
||
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`; | ||
const publicUrl = new URL(`${config.get('apiPublicBaseUrl', 'https://api.apify.com')}/v2/key-value-stores/${this.id}/records/${key}`); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for the explicit default here, its the exact same value as you define on the config level
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as @B4nan wondering why apiPublicBaseUrl
when we already have apiBaseUrl
.
As in, I know that we're injecting as env variable APIFY_API_PUBLIC_BASE_URL
... but I had no idea we have apparently also APIFY_API_BASE_URL
😅
test/e2e/sdk/publicUrl/src/main.mjs
Outdated
import { Actor, log } from 'apify'; | ||
|
||
// Also needs to be changed in test.mjs | ||
const PUBLIC_RECORD_KEY = 'public-record-key'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(opt) How about you pass this as input to the Actor? All three values...
- data to upload
- key where to upload the data
- key where to store the public URL
Then you don't need to define anything in two places and the test will be more readable (not that it is super complicated or anything 😄 )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point, thank you 🙏 Refactored it
Hmm, maybe it was about that actually. |
I checked the code, and that exactly seems to be the case 👍 I think the traffic from Actors to Apify API does not go through the Internet. It's routed internally through AWS, presumably as it's cheaper. So |
This is right, that's why I used |
@@ -16,6 +16,7 @@ export interface ConfigurationOptions extends CoreConfigurationOptions { | |||
actorRunId?: string; | |||
actorTaskId?: string; | |||
apiBaseUrl?: string; | |||
apiPublicBaseUrl?: string; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add a comment here about the difference between apiBaseUrl
and apiPublicBaseUrl
test/e2e/sdk/publicUrl/test.mjs
Outdated
|
||
const actor = client.actor(process.argv[2]); | ||
|
||
const run = await actor.call({ data: PUBLIC_DATA, recordKey: 'public-record-key' }, { waitSecs: 15 }); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(opt) public-record-key
is actually not needed to be passed as input, it's completely internal 🙈 You will get it as part of the public URL, but you don't care what's in it, you only care that the URL works.
What you might want to pass in is urlToPublicData
. But you could also just upload the URL to the dataset and fetch it from there.
Anyway, sorry for all the back and forths on this 😅
This PR is part of Issue #19363, which updates
KeyValueStore.getPublicUrl(recordKey)
to generate signed links using HMAC.This PR:
P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee.
Previous PR that adds storageObject to Crawlee is here