mfdlabs/grid-bot-support

This repository was archived by the owner on Mar 1, 2024. It is now read-only.

Milestones

DateTime offsets
This milestone aims to fix the mass date time offsets on script exections and screenshots.
No due date
•1/1 issues closed
100% complete0 open 1 closed
Arbiter Queue Backlogging
### Description Within the next week (from the date 12/03/2022), we will be beginning to implement a backlog system into the bot's grid server arbiter. What this entails is the following. # Reasoning. The main reasoning behind this is hardware availability, with the more grid servers that are opened, the machine will start to run out of memory, or the paging file can become empty. This will cause major drops in usage and major QoS issues where the bot will just crash, we want to avoid this by limiting the amount of instances open at once. On start we normally will set up an instance pool for better HA (High Availability) capabilities, if we have this pool then an instance can be cherry picked at random and used. After they are doing executing (Only for script execution) it will lock the instance and prevent it's reusage by other users or even the original user. It will also open another instance regardless of if we already have enough in the pool after any of their leases expire. # How this affects you as a user? This will affect users in the following ways: 1. You will not be able to spam script executions (if we decide on backlog) 2. There may be instability issues with shared instances (if there is already something happening on it, it could throw.) 3. There may be huge wait times for instances to start executing during peak times. # Implementation We will try to implement either of the following solutions, this pull request just generalizes the idea, it won't necessarily be just a backlogging system. 1. Arbiter Queue backlogging. 2. Shared usage of instances. 3. Reusage of the same instance, renewing leases. # 1. Arbiter Queue Backlogging. Arbiter Queue backlogging will involve creating a seperate queue of instances waiting to either be opened, or waiting to do work, this queue will only be backlogged if a specific condition is met, such as the total memory usage of all the currently open instances reaches past a specific number or the number of instances open reaches a specific number. The plan for this is to limit the amount open, with a threshold, but also having a percentage invoker determine if a random instance will be opened or queued, this would improve HA (High Availability) capabilities because it will either use an already opened instance or the new instance, there would also have to be some checks that determine if it actually worth opening it. The reason this seems slightly ideal is because it will limit the amount of memory usage on the machine, but comes with the downside of potentially long wait times for script executions to begin. # 2. Shared usage of instances. Right now every time you use a script execution command, it will try to get an instance that hasn't been used at all, so you will have a fresh instance to use, and it doesn't affect other users if you crash the instance, or cause timeouts. Sharing instances can bring down instance count because there will be less of a reason to open more instances if users can bounce between already pooled instances, and paired with backlogging this can push more users into less used instances with some math tricks and AB experiments. The downsides to this are obviously crash and timeout exploits, there's no way of recovering instances if they crash, or a user times out the instance, which may affect others who have data on the output of the instance. Another downside is output flooding, because instances are shared, people can log and flood the output of people's instances, while it's random, it can affect users if they executed something and want to check outputs of their code. # 3. Reusage of the same instance, renewing leases. The final solution could be to reuse instances, and lock these to specific users, this will also bring down the instance count and allow users to reuse the outputs of previously executed commands. And this will improve the QoS for users as random people cannot cause crash exploits. We can also do something where another instance can randomly be allocated, and the user will bounce between each instance respectively. If the user gets blacklisted it will attempt to purge all of the data related to their instances. The downsides of this are that more instances will be open at once, more code and factories will need to implemented to track ownership of instances which means larger code base.
No due date
•2/2 issues closed
100% complete0 open 2 closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Milestones

DateTime offsets

Arbiter Queue Backlogging

Milestones

List view

DateTime offsets

Arbiter Queue Backlogging