Skip to content

feat: sign record's public url #358

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 14 commits into from
Apr 1, 2025
Merged

feat: sign record's public url #358

merged 14 commits into from
Apr 1, 2025

Conversation

danpoletaev
Copy link
Contributor

This PR is part of Issue #19363, which updates KeyValueStore.getPublicUrl(recordKey) to generate signed links using HMAC.

This PR:

  • creates signature and appends it to the public URL of the record in KV store

P.S. Before merging, we need to wait for the release of Crawlee, so we can update version of Crawlee.

Previous PR that adds storageObject to Crawlee is here

@danpoletaev danpoletaev requested review from B4nan and barjin February 17, 2025 14:10
@github-actions github-actions bot added this to the 109th sprint - Platform team milestone Feb 17, 2025
@github-actions github-actions bot added the t-platform Issues with this label are in the ownership of the platform team. label Feb 17, 2025
return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`;
}

return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}?signature=${createHmacSignature(this.storageObject.urlSigningSecretKea as string, key)}`;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we pls compose the URL with URL.searchParams and similar? Maybe concatting the path with path.posix.join() too?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aside from ensuring the correct URL syntax, this procedural approach might also allow us to only specify the pathname once (and just conditionally add the query parameter if the secret exists).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, makes sense. Refactored it 🙃

@@ -19,7 +20,11 @@ export class KeyValueStore extends CoreKeyValueStore {
return getPublicUrl.call(this, key);
}

return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`;
if (!this.storageObject?.urlSigningSecretKea) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

urlSigningSecretKea instead of ...key might be a typo?

@danpoletaev danpoletaev requested a review from barjin February 17, 2025 14:28
Copy link
Contributor

@barjin barjin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

(as long as we wait for the new stable Crawlee version, but I see you mentioned that in the PR description already 👍🏽 )

Copy link
Member

@B4nan B4nan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good, except the red CI which is not related to your changes

@github-actions github-actions bot added the tested Temporary label used only programatically for some analytics. label Mar 5, 2025
vdusek pushed a commit to apify/apify-sdk-python that referenced this pull request Mar 7, 2025
This PR is part of [Issue
#19363](apify/apify-core#19363), which updates
`KeyValueStore.getPublicUrl(recordKey)` to generate signed links using
HMAC.

**This PR**:
- creates signature and appends it to the public URL of the record in KV
store

P.S. Before merging, we need to wait for the release of Crawlee, so we
can update version of Crawlee.

P.P.S It's hard to test the changes, since master branch of SDK and
master branch of Crawlee are out of sync. Crawlee has some breaking
changes in version 6.0, which are not yet addressed in master branch of
SDK.

Previous PR that adds storageObject to Crawlee Python is
[here](apify/crawlee-python#993)

Same PR in SDK JS is
[here](apify/apify-sdk-js#358)
@danpoletaev danpoletaev requested a review from tobice March 10, 2025 09:36
@@ -19,7 +20,13 @@ export class KeyValueStore extends CoreKeyValueStore {
return getPublicUrl.call(this, key);
}

return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`;
const publicUrl = new URL(`https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) Isn't the API base URL available in the env variables?

That way the Actor will create valid urls in all environments. You could leverage that to create an actual e2e test that not only creates & runs an Actor that produces the signed URL, but actually tests the URL against Apify API.

(The current test below basically just copies the implementation... you are calling the same code twice and checking that it produces the same value 😅 )

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there was a reason for this. if you ask what reason, i have no idea, but there is something at the back of my mind telling me there was one 🙃

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@B4nan I think it should be safe to use config.get('apiPublicBaseUrl') variable as it has fallback to https://api.apify.com?

@danpoletaev danpoletaev requested a review from tobice March 14, 2025 15:09
package.json Outdated
@@ -67,7 +67,7 @@
"@typescript-eslint/eslint-plugin": "^7.0.0",
"@typescript-eslint/parser": "^7.0.0",
"commitlint": "^19.3.0",
"crawlee": "^3.11.5",
"crawlee": "^3.13",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep the full version definition here

Comment on lines 62 to 64
"@crawlee/core": "^3.12.3-beta.12",
"@crawlee/types": "^3.12.3-beta.12",
"@crawlee/utils": "^3.12.3-beta.12",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and don't forget to bump this

"@crawlee/core": "^3.12.3-beta.12",
"@crawlee/types": "^3.12.3-beta.12",
"@crawlee/utils": "^3.12.3-beta.12",
"apify-client": "^2.11.1",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why the downgrade here? 👀

return getPublicUrl.call(this, key);
}

return `https://api.apify.com/v2/key-value-stores/${this.id}/records/${key}`;
const publicUrl = new URL(`${config.get('apiPublicBaseUrl', 'https://api.apify.com')}/v2/key-value-stores/${this.id}/records/${key}`);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for the explicit default here, its the exact same value as you define on the config level

Copy link

@tobice tobice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as @B4nan wondering why apiPublicBaseUrl when we already have apiBaseUrl.

As in, I know that we're injecting as env variable APIFY_API_PUBLIC_BASE_URL... but I had no idea we have apparently also APIFY_API_BASE_URL 😅

import { Actor, log } from 'apify';

// Also needs to be changed in test.mjs
const PUBLIC_RECORD_KEY = 'public-record-key';
Copy link

@tobice tobice Mar 17, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(opt) How about you pass this as input to the Actor? All three values...

  • data to upload
  • key where to upload the data
  • key where to store the public URL

Then you don't need to define anything in two places and the test will be more readable (not that it is super complicated or anything 😄 )

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point, thank you 🙏 Refactored it

@B4nan
Copy link
Member

B4nan commented Mar 17, 2025

As in, I know that we're injecting as env variable APIFY_API_PUBLIC_BASE_URL... but I had no idea we have apparently also APIFY_API_BASE_URL 😅

Hmm, maybe it was about that actually. APIFY_API_BASE_URL is some local URL that won't work from outside of the platform/worker, right?

@tobice
Copy link

tobice commented Mar 17, 2025

Hmm, maybe it was about that actually. APIFY_API_BASE_URL is some local URL that won't work from outside of the platform/worker, right?

I checked the code, and that exactly seems to be the case 👍 I think the traffic from Actors to Apify API does not go through the Internet. It's routed internally through AWS, presumably as it's cheaper.

So APIFY_API_PUBLIC_BASE_URL is indeed the correct value to use in this particular case.

@danpoletaev
Copy link
Contributor Author

This is right, that's why I used APIFY_API_PUBLIC_BASE_URL, because apiBaseUrl was an IP address

@danpoletaev danpoletaev requested review from B4nan and tobice March 17, 2025 13:54
@@ -16,6 +16,7 @@ export interface ConfigurationOptions extends CoreConfigurationOptions {
actorRunId?: string;
actorTaskId?: string;
apiBaseUrl?: string;
apiPublicBaseUrl?: string;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lets add a comment here about the difference between apiBaseUrl and apiPublicBaseUrl


const actor = client.actor(process.argv[2]);

const run = await actor.call({ data: PUBLIC_DATA, recordKey: 'public-record-key' }, { waitSecs: 15 });
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(opt) public-record-key is actually not needed to be passed as input, it's completely internal 🙈 You will get it as part of the public URL, but you don't care what's in it, you only care that the URL works.

What you might want to pass in is urlToPublicData. But you could also just upload the URL to the dataset and fetch it from there.

Anyway, sorry for all the back and forths on this 😅

@danpoletaev danpoletaev merged commit 6274cc0 into master Apr 1, 2025
8 checks passed
@fnesveda fnesveda added the validated Issues that are resolved and their solutions fulfill the acceptance criteria. label Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
t-platform Issues with this label are in the ownership of the platform team. tested Temporary label used only programatically for some analytics. validated Issues that are resolved and their solutions fulfill the acceptance criteria.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants