Skip to content

Container Pre-Initialisation Thinks Proxies Exist After Node Restart #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
nik-humphries opened this issue Feb 20, 2025 · 1 comment

Comments

@nik-humphries
Copy link

nik-humphries commented Feb 20, 2025

Hi.

ShinyProxu 3.1.1

On our kubernetes cluster, updates are semi regular to keep up to date with patches and the like. This means we get a lot of service restarts (up to once a week). It appears that when the services are restarted, ShinyProxy believes that an app we have a pre-init for with minimum seats available has a proxy available, even though it doesn't. We are using redis sentinel to manage the sessions.

Upon a restart, the key shinyproxy_shinyproxy-play-a3s-play__delegate_proxies_bto-game is present which has one hash value.
There are however no pods running on the cluster.
The key shinyproxy_shinyproxy-play-a3s-play__seats_bto-game shows 5 open seats (it is set to 5).

When trying to open an app, we just get stuck on a whitescreen for a while as the following happens

2025-02-20T15:50:42.077Z  INFO 1 --- [ProxyService-16] e.o.containerproxy.service.ProxyService  : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game] Starting proxy
2025-02-20T15:50:42.081Z  INFO 1 --- [ProxyService-16] e.o.c.b.d.p.ProxySharingDispatcher       : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game delegateProxyId=50f0f68e-7216-48b0-870f-c50455485acd seatId=977ae2ff-44ee-4f16-8313-78b6596c0fce] Seat claimed
2025-02-20T15:50:42.089Z  INFO 1 --- [ProxyService-16] e.o.containerproxy.service.ProxyService  : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game] Proxy activated
2025-02-20T15:57:14.114Z  INFO 1 --- [   XNIO-1 I/O-1] e.o.c.util.ProxyMappingManager           : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game] Proxy unreachable/crashed, stopping it now, failed request: GET https://xxx/proxy_endpoint/e9c2260f-4266-4d7b-a0b2-087b08de1347/ was proxied to: http://10.121.32.246:3838/, status: 503
2025-02-20T15:57:14.119Z  INFO 1 --- [ProxyService-16] e.o.c.b.d.p.ProxySharingDispatcher       : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game delegateProxyId=50f0f68e-7216-48b0-870f-c50455485acd seatId=977ae2ff-44ee-4f16-8313-78b6596c0fce] Seat released
2025-02-20T15:57:14.122Z  INFO 1 --- [ProxyService-16] e.o.containerproxy.service.ProxyService  : [user=user94 proxyId=e9c2260f-4266-4d7b-a0b2-087b08de1347 specId=bto-game] Proxy released
2025-02-20T15:57:14.122Z  INFO 1 --- [GlobalEventLoop] e.o.c.b.d.p.ProxySharingScaler           : [specId=bto-game delegateProxyId=50f0f68e-7216-48b0-870f-c50455485acd] DelegateProxy crashed, marking for removal
2025-02-20T15:57:14.129Z  INFO 1 --- [GlobalEventLoop] e.o.c.b.d.p.ProxySharingScaler           : [specId=bto-game delegateProxyId=50f0f68e-7216-48b0-870f-c50455485acd seatId=977ae2ff-44ee-4f16-8313-78b6596c0fce] Removed seat

It takes at least 7 mintues for ShinyProxy to realise that there isn't a delegate proxy available and that it needs to remove the reference and create a new one.

Is there a way to speed up this process / make it so it doesn't require a user to try to open an application, and then wait for such a long time?

The app spec is as follows.

- op: add
  path: /spec/proxy/specs/-
  value:
    id: bto-game
    display-name: xxx
    description: ""
    container-cmd: ["R", "-e", "options('shiny.port'=3838,shiny.host='0.0.0.0');shiny::runApp('/root/Shiny')"]
    container-memory-request: "400Mi"
    container-memory-limit: "400Mi"
    minimum-seats-available: 2
    seats-per-container: 5
    container-env:
      IS_AZURE: true
      USE_CREDENTIALS_DOT_R: false
      PG_PORT: 5432
      PG_DBNAME: optimisationgame
      APP_TIMEOUT: 1000
      PG_HOST: xxx
      PG_USER: xxx
      POSTGRESQL_USER_APPEND: FALSE
      AZ_MSI_CLIENT_ID: ${apps_pod_identity_client_id}
    logo-url: xxx
    access-groups: [admins, gen, bto]
    template-group: bto
    labels:
      aadpodidbinding: a3splay-pod-identity
    kubernetes-pod-patches: |
      - op: add
        path: /spec/tolerations
        value:
          - effect: NoSchedule
            key: workload
            operator: Equal
            value: a3s

Redis setup

  spring:
    session:
      store-type: redis
    data:
      redis:
        password: $${REDIS_PASSWORD}
        sentinel:
          master: shinyproxy
          password: $${REDIS_PASSWORD}
          nodes: redis-node-0.redis-headless.a3s-core:26379, redis-node-1.redis-headless.a3s-core:26379, redis-node-2.redis-headless.a3s-core:26379
@LEDfan
Copy link
Member

LEDfan commented Mar 7, 2025

Hi

ShinyProxy currently doesn't actively check whether the pods still exists. We do want to improve this in the future.
However, as you noticed, ShinyProxy will remove the instance if it notices the requests are failing. In my experiences this doesn't take 7 minutes, so I'll have a look why this is the case for you/

I think the best workaround here is to have a look at the api endpoint that allows to stop/restart the delegate proxies: https://shinyproxy.io/downloads/swagger/?urls.primaryName=ShinyProxy%203.1.1#/ShinyProxy/stopDelegateProxies

The idea for zero-downtime updates would then be:

  1. cordon the nodes that will be updated
  2. execute the above API endpoint
  3. ShinyProxy will create new pods, any pod that is being used by a user will stay running until no users are using the pod
  4. as soon as the node has no running pods, you can patch it
  5. uncordon the node

The endpoint can be used by admin users. We have plans to create a system that allows to call the API without using admin accounts, see #545

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants