feat: API for deals tracked per miner/client #194

bajtos · 2024-08-06T08:35:20Z

Links:

Out of the scope of this pull request:

querying deals by allocator id

Example request:

GET /miner/f0814049/deals/eligible/summary

Example response:

{
  "minerId": "f0814049",
  "dealCount": 10006,
  "clients": [
    { "clientId": "f02516933", "dealCount": 6880 },
    { "clientId": "f02833886", "dealCount": 3126 }
  ]
}

Example request:

GET /clients/f0215074/deals/eligible/summary

Example response:

{
  "clientId": "f0215074",
  "dealCount": 5350,
  "providers": [
    { "minerId": "f0406478", "dealCount": 4592 },
    { "minerId": "f0814049", "dealCount": 758 }
  ]
}

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Git-based dependency fails in `NPM_CONFIG_WORKSPACE=stats npm ci`: ``` npm error code 1 npm error git dep preparation failed npm error command /opt/hostedtoolcache/node/20.16.0/x64/bin/node /opt/hostedtoolcache/node/20.16.0/x64/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/home/runner/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run npm error npm warn using --force Recommended protections disabled. npm error npm error No workspaces found: npm error npm error --workspace=stats ``` Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos · 2024-08-07T06:06:57Z

db/package.json

+    "spark-api": "https://github.com/filecoin-station/spark-api/archive/3de5c3325ef6cc40df02769b96957d33279d38c1.tar.gz",
    "spark-evaluate": "filecoin-station/spark-evaluate#main"


FYI: I cannot use the git/GitHub shorthand for spark-api because spark-api has workspaces, and npm refuses to install the entire project when running npm ci limited to a single spark-stats workspace. I think it's a bug in npm, but 🤷🏻

$ NPM_CONFIG_WORKSPACE=stats npm ci (...) npm error code 1 npm error git dep preparation failed npm error command /opt/hostedtoolcache/node/20.16.0/x64/bin/node /opt/hostedtoolcache/node/20.16.0/x64/lib/node_modules/npm/bin/npm-cli.js install --force --cache=/home/runner/.npm --prefer-offline=false --prefer-online=false --offline=false --no-progress --no-save --no-audit --include=dev --include=peer --include=optional --no-package-lock-only --no-dry-run npm error npm warn using --force Recommended protections disabled. npm error npm error No workspaces found: npm error npm error --workspace=stats

bajtos · 2024-08-07T06:10:01Z

@juliangruber hindsight please 🙏🏻

sentry-io · 2024-08-07T06:13:08Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ Error: connect ECONNREFUSED 127.0.0.1:5432 getApiPgPool(index) View Issue

_{Did you find this useful? React with a 👍 or 👎}

juliangruber

I have concerns with this PR

juliangruber · 2024-08-12T10:26:17Z

README.md


- `GET /miners/retrieval-success-rate/summary?from=<day>&to=<day>`
-
-  http://stats.filspark.com/miners/retrieval-success-rate/summary


why was this removed?

Hmm, this looks like a mistake; thanks for flagging it!

Found it. This was a duplicate entry, the endpoint is already documented on lines 19-21 above.

juliangruber · 2024-08-12T10:28:24Z

.github/workflows/ci.yml

    env:
      DATABASE_URL: postgres://postgres:postgres@localhost:5432/spark_stats
      EVALUATE_DB_URL: postgres://postgres:postgres@localhost:5432/spark_evaluate
+      API_DB_URL: postgres://postgres:postgres@localhost:5432/spark


I'm strongly against this design. I thought we had settled on an architecture where services don't directly query databases not owned by them. Now we're introducing one more of this antipattern (as agreed by us).

I think the way to go instead is for spark-api to expose the data that spark-stats needs via HTTP.

I see your concern! I actually started to implement the spark-api HTTP endpoint here: CheckerNetwork/spark-api#388, but then I realised we want to put this behind Cloudflare and protect it using the same access token that we will use for other spark-stats endpoints.

@juliangruber
How would you propose implementing your suggestion? Should spark-stats simply proxy requests for the new endpoints to spark-api?

const getRetrievableDealsForMiner = async (_req, res, minerId) => { const apiRes = await fetch(`${sparkApiUrl}/miner/${minerId}/deals/eligible/summary`) assert(apiRes.ok, 502) json(res, await apiRes()) }

I'd like to prevent people from calling spark-api's endpoint directly, so I would need to implement some sort of an authentication. I can implement a fixed access token configured via Fly secrets both in spark-api and spark-stats, let spark-stats send that token in the request and let spark-api reject requests with a missing or an incorect token.

I can also YOLO it and create a tech debt that we will need to pay once we start working on the Spark Data API product:

Option A: implement these spark-stats endpoints as HTTP redirects to spark-api

Option B: implement these endpoints as a proxy, but don't authenticate them - allow anybody to request the data from spark-api directly if they know about this option.

Option A:

const getRetrievableDealsForMiner = async (_req, res, minerId) => { res.setHeader( 'location', `${sparkApiUrl}/miner/${minerId}/deals/eligible/summary` ) res.statusCode = 302 res.end() }

Depending on what direction we choose, I'll need to rework the API for deals tracked per allocator (#196) as well.

These are the options I see and think could work:

spark-api exposes an HTTP API and spark-stats proxies to it. For authentication we can use a hard coded access token, or keccak256 method used in spark-rewards.

spark-api publishes this data to spark-stats (via HTTP), which persists it in its own database

spark-stats periodically pulls this data from spark-api (via HTTP) and persists it in its own database

I think 1. is the easiest to implement. 2. and 3. are the safest, as spark-api won't be affected by spark-stats load.

I was thinking about this a bit more.

To track DDO deals, I will need to build a new service that listens to Filecoin actor events and maintains its own deal database. Then both spark-api and spark-stats will need to get data about eligible deals from that service, most likely over HTTP.

We may want to make that data about deals public to allow 3rd parties to verify that Spark is sampling the deals fairly.

👍

In that light, here is my plan:

Copy the implementation of the APIs introduced here to feat: API for deals tracked per allocator #196

I don't understand this, how are you going to copy APIs to a closed PR?

Rework spark-stats endpoints to redirect to spark-api, drop the dev-dependency on spark-api repository.

Rationale:

spark-api is already behind Cloudflare and it accepts only requests made via Cloudflare. If spark-stats implements a reverse proxy, we will create unnecessary network traffic we need to pay for (Cloudflare -> spark-stats -> Cloudflare -> spark-api).

Since we don't know yet whether we want/need to make the data about eligible deals private, let's keep it public for now as it requires less work.

Redirect also works for me, if it's a temporary one (302). We don't want anyone to remember spark-api's URL if it's not an API we want to commit to hosting

Copy the implementation of the APIs introduced here to #196
I don't understand this, how are you going to copy APIs to a closed PR?

Sorry, I should have re-read the comment before submitting.

I want to copy the API endpoints from this PR and #196 to spark-api.

Redirect also works for me, if it's a temporary one (302). We don't want anyone to remember spark-api's URL if it's not an API we want to commit to hosting

💯

Sounds good!

spark-api PR:

feat: API for deals tracked per miner/client/allocator spark-api#403
spark-stats PR:

feat: handle eligible deal stats via spark-api #206

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

feat: API for deals tracked per miner/client

cd35137

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos requested review from juliangruber and patrickwoodhead August 6, 2024 08:35

bajtos mentioned this pull request Aug 6, 2024

feat: get retrievable deals per miner or client CheckerNetwork/spark-api#388

Closed

bajtos added 6 commits August 6, 2024 10:40

fixup! remove unintended change in dependabot config

c52d466

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

fixup! define API_DB_URL for CI workflow

938abb9

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

fixup! create database spark during CI workflow

3faaacf

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

fixup! rename tracked -> eligible

d60b15f

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

fixup! pin spark-api version

c171531

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

bajtos commented Aug 7, 2024

View reviewed changes

bajtos enabled auto-merge (squash) August 7, 2024 06:09

bajtos disabled auto-merge August 7, 2024 06:09

bajtos merged commit c117b69 into main Aug 7, 2024

bajtos deleted the feat-deals-tracked branch August 7, 2024 06:09

bajtos mentioned this pull request Aug 7, 2024

perf: index retrievable deals by miner & by client CheckerNetwork/spark-api#390

Merged

juliangruber reviewed Aug 12, 2024

View reviewed changes

bajtos added a commit to CheckerNetwork/spark-api that referenced this pull request Sep 2, 2024

additions from CheckerNetwork/spark-stats#194

4eb36be

Signed-off-by: Miroslav Bajtoš <oss@bajtos.net>

This was referenced Sep 2, 2024

feat: API for deals tracked per miner/client/allocator CheckerNetwork/spark-api#403

Merged

feat: handle eligible deal stats via spark-api #206

Merged

Expose deal count per SP, in addition to SRS CheckerNetwork/roadmap#162

Closed

		"spark-api": "https://github.com/filecoin-station/spark-api/archive/3de5c3325ef6cc40df02769b96957d33279d38c1.tar.gz",
		"spark-evaluate": "filecoin-station/spark-evaluate#main"


		- `GET /miners/retrieval-success-rate/summary?from=<day>&to=<day>`

		http://stats.filspark.com/miners/retrieval-success-rate/summary

feat: API for deals tracked per miner/client #194

feat: API for deals tracked per miner/client #194

Uh oh!

Conversation

bajtos commented Aug 6, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bajtos commented Aug 7, 2024

Uh oh!

sentry-io bot commented Aug 7, 2024

Suspect Issues

Uh oh!

juliangruber left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bajtos Aug 29, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

bajtos Sep 2, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bajtos commented Aug 6, 2024 •

edited

Loading

bajtos Aug 29, 2024 •

edited

Loading

bajtos Sep 2, 2024 •

edited

Loading