Skip to content

Update mass deployment script #3838

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: development
Choose a base branch
from

Conversation

maayarosama
Copy link
Contributor

Description

Rewrote handle method inside script file not to rollback and to output each batch nodes that successfully and failed to deploy on

Related Issues

Tested Scenarios

  • Run the script

To consider

Preliminary Checks:

  • Preliminary Checks
    • Does it completely address the issue linked?
    • What about edge cases?
    • Does it meet the specified acceptance criteria?
    • Are there any unintended side effects?
    • Does the PR adhere to the team's coding conventions, style guides, and best practices?
    • Does it integrate well with existing features?
    • Does it impact the overall performance of the application?
    • Are there any bottlenecks or slowdowns?
    • Has it been optimized for efficiency?
    • Has it been adequately tested with unit, integration, and end-to-end tests?
    • Are there any known defects or issues?
    • Is the code well-documented?
    • Are changes to documentation reflected in the code?

UI Checks:

  • UI Checks
    • If a UI design is provided/ does it follow it?
    • Does every button work?
    • Is the data displayed logical? Is it what you expected?
    • Does every validation work?
    • Does this interface feel intuitive?
    • What about slow network? Offline?
    • What if the gridproxy/graphql/chain is failing?
    • What would a first time user have a hard time navigating here?

Code Quality Checks:

  • Code Quality Checks
    • Code formatted/linted? Are there unused imports? .. etc
    • Is there redundant/repeated code?
    • Are there conditionals that are always true or always false?
    • Can we write this more concisely?
    • Can we reuse this code? If yes, where?
    • Will the changes be easy to maintain and update in the future?
    • Can this code become too complex to understand for other devs?
    • Can this code cause future integration problems?

Testing Checklist

  • Does it Meet the specified acceptance criteria.
  • Test if changes can affect any other functionality.
  • Tested with unit, integration, and end-to-end tests.
  • Tested the un-happy path and rollback scenarios.
  • Documentation updated to meet the latest changes.
  • Check automated tests working successfully with the changes.
  • Can be covered by automated tests.

General Checklist

  • Tests included
  • Build pass
  • Documentation
  • Code format and docstring
  • Screenshots/Video attached (needed for UI changes)

…ach batch nodes that successfully and failed to deploy on
}

const res = await Promise.allSettled(deploymentPromises);
console.log("Deployment Results in handle:", res);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think there is no need to log this

@maayarosama maayarosama requested a review from 0oM4R January 28, 2025 12:51
Copy link
Contributor

@0oM4R 0oM4R left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to run the script with 1500 vm with batch size 100
it took more than 3 hours and the summary shows that we have only about 180 while I have 2100 contracts and 3 summaries of batches have successful deployment more than the final summary
script output
https://raw.githubusercontent.com/threefoldtech/tfgrid-sdk-ts/refs/heads/development_mass_deployement_test/packages/grid_client/output.txt

Comment on lines 96 to 97
await grid3.machines.twinDeploymentHandler.sendToNode(twinDeployment);
successfulNodesSet.push(twinDeployment.nodeId);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

after sending the deployment to the node, we should wait and check the deployment result

} catch (error) {
log(`Error creating deployment for batch ${batch + 1}: ${error}`);
return { twinDeployments: null, batchIndex: index };
return [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The failed deployments here are not counted

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maayarosama please explain how this was resolved

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

failed deployments and successfull deployments are counted in the handle method after calling waitForDeployments

@maayarosama maayarosama marked this pull request as draft February 4, 2025 11:28
@maayarosama maayarosama marked this pull request as ready for review February 9, 2025 19:07
Copy link
Contributor

@amiraabouhadid amiraabouhadid left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is batch 1 summary empty?
image
also, I don't get the single apostrophe instead of a zero?
image

@maayarosama maayarosama marked this pull request as draft February 10, 2025 11:49
@maayarosama maayarosama marked this pull request as ready for review February 10, 2025 14:00
Comment on lines 333 to 338
log(`Batch ${batch + 1} Summary:`);
log(`- Successful Deployments on Nodes: ${Array.from(batchSuccessfulNodes).join(", ")}`);
log(`- Failed Deployments on Nodes: ${Array.from(batchFailedNodes).join(", ")}`);
log(`- Successful Deployments: ${batchSuccessfulNodes.size}`);
log(`- Failed Deployments: ${batchFailedNodes.size}`);
log("---------------------------------------------");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image
great, batch 1 summary is no longer empty. But can we please put a zero/ or a dash instead of empty size/nodes values?

Suggested change
log(`Batch ${batch + 1} Summary:`);
log(`- Successful Deployments on Nodes: ${Array.from(batchSuccessfulNodes).join(", ")}`);
log(`- Failed Deployments on Nodes: ${Array.from(batchFailedNodes).join(", ")}`);
log(`- Successful Deployments: ${batchSuccessfulNodes.size}`);
log(`- Failed Deployments: ${batchFailedNodes.size}`);
log("---------------------------------------------");
log(`Batch ${batch + 1} Summary:`);
log(
`- Successful Deployments on Nodes: ${
Array.from(batchSuccessfulNodes).length ? Array.from(batchSuccessfulNodes).join(", ") : "-"
}`,
);
log(
`- Failed Deployments on Nodes: ${
Array.from(batchFailedNodes).length ? Array.from(batchFailedNodes).join(", ") : "-"
}`,
);
log(`- Successful Deployments: ${batchSuccessfulNodes.size ?? 0}`);
log(`- Failed Deployments: ${batchFailedNodes.size ?? 0}`);
log("---------------------------------------------");

Comment on lines 346 to 351
log("Final Summary:");
log(`- Total Successful Deployments: ${allSuccessfulNodes.length}`);
log(`- Total Failed Deployments: ${totalVMs - allSuccessfulNodes.length}`);
log(`- Offline Nodes: ${offlineNodes.join(", ")}`);
log(`- All Successful Deployments on Nodes: ${Array.from(allSuccessfulNodes).join(", ")}`);
log(`- All Failed Deployments on Nodes: ${Array.from(allFailedNodes).join(", ")}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply here too

await grid3.machines.twinDeploymentHandler.checkNodesCapacity(twinDeployments);
await grid3.machines.twinDeploymentHandler.checkFarmIps(twinDeployments);
} catch (error) {
events.emit("logs", "Error Validating workloads:" + error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

then what should it continue or mark the deployments as failed?

results.forEach((result, index) => {
const twinDeployment = twinDeployments[index];
if (result.status === "fulfilled") {
passedDeployments.push(twinDeployment);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in case if the operation is not update or deploy
the result will be { status: "fulfilled", value: null }
so we are pussing null to the passedDeployments

Comment on lines +27 to +35
const results = await Promise.allSettled(
twinDeployments.map(t => {
if ([Operations.deploy, Operations.update].includes(t.operation)) {
events.emit("logs", `Waiting for deployment with contract_id: ${t.deployment.contract_id} to be ready`);
return grid3.machines.twinDeploymentHandler.waitForDeployment(t, timeout);
}

return null;
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we need to filter the array to remove null values or handle the null values later

import {
Deployment,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it is used anywhere if so please remove it

Comment on lines +100 to +102
const extrinsicResults: Contract[] = await grid3.machines.twinDeploymentHandler.tfclient.applyAllExtrinsics<Contract>(
[...nodeExtrinsics, ...nameExtrinsics],
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this should be in try catch block

Comment on lines +113 to +128
if (twinDeployment.operation === Operations.deploy) {
events.emit("logs", `Sending deployment to node_id: ${twinDeployment.nodeId}`);
for (const contract of extrinsicResults) {
if (twinDeployment.deployment.challenge_hash() === contract.contractType.nodeContract.deploymentHash) {
twinDeployment.deployment.contract_id = contract.contractId;
if (
twinDeployment.returnNetworkContracts ||
!(
twinDeployment.deployment.workloads.length === 1 &&
twinDeployment.deployment.workloads[0].type === WorkloadTypes.network
)
)
resultContracts.created.push(contract);
break;
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you explain what are we doing here, i can see you are updating the resultContracts but its not clear the meaning of inner conditionse ?

events.emit("logs", `Deployment has been updated with contract_id: ${twinDeployment.deployment.contract_id}`);
}
} catch (e) {
events.emit("logs", `Deployment failed on node_id: ${twinDeployment.nodeId} with error: ${e}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we are missing the add to the failed deployment step.

}
}

const deletedResult = await grid3.machines.twinDeploymentHandler.tfclient.applyAllExtrinsics<number>(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add it in a try block please to avoid panic

@@ -80,39 +240,29 @@ async function main() {
availableFor: await grid3.twins.get_my_twin_id(),
farmIds: farmIds,
randomize: true,
features: [Features.yggdrasil],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i suggest to remove ygg and add myceluim to include zos3light machines

} catch (error) {
log(`Error creating deployment for batch ${batch + 1}: ${error}`);
return { twinDeployments: null, batchIndex: index };
return [];
}
});
console.time("Preparing Batch " + (batch + 1));
const deploymentResults = await Promise.allSettled(deploymentPromises).then(results =>
results.flatMap(r => (r.status === "fulfilled" ? r.value : [])),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this means the rejected deployments will be ignored


log(`Successfully handled deployments for Batch ${batch + 1}`);
} else {
errors.push(`Error handling deployments for Batch ${batch + 1}: ${result.reason}`);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we handle the rejected one to? i mean by pusshing it to the failed deployements

@0oM4R
Copy link
Contributor

0oM4R commented Feb 13, 2025

i think we still have issue with the summary numbers
image

but it appears once in the batch summary

image created contarcts 1460 image

full script output:https://gist.github.com/0oM4R/461e3ecb266aa782da912c9822258842

@0oM4R 0oM4R closed this Feb 13, 2025
@0oM4R 0oM4R reopened this Feb 13, 2025
@0oM4R
Copy link
Contributor

0oM4R commented Feb 13, 2025

sorry misclick

@samaradel
Copy link
Contributor

image

I'm not sure if you should handle this or not, otherwise, it deployed 2 batches successfully

image

@zaelgohary zaelgohary marked this pull request as draft April 2, 2025 09:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants