Skip to content

Commit 8f1a0e1

Browse files
Update experiment_issues.md
1 parent 8ad0706 commit 8f1a0e1

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/e2e_journal_experiment/experiment_issues.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,7 +2,7 @@ Experiments on different [variations of fnn](https://docs.google.com/spreadsheet
22
**training phase only**
33
- We initially ran 3 models on cuda:1 and 3 models on cuda:3, resulting in a training time of ~3 hours per epoch.
44
- Running 6 models in parallel led to CPU usage [exceeding 97%](https://github.com/mahdis-saeedi/OpeNTF/blob/main/docs/e2e_journal_experiment/cpu%26gpu_usage/cpu_6models_2gpus.txt) across all 224 cores.
5-
- We terminated 3 runs on cuda:3, which reduced CPU usage to ~7% per core, but it was not stable and increased to more than 95% again.
5+
- We terminated 3 runs on cuda:3, which reduced CPU usage to [~6%](https://github.com/mahdis-saeedi/OpeNTF/blob/main/docs/e2e_journal_experiment/cpu%26gpu_usage/cpu_3models_1gpu_not_stable.txt) per core, but it was not stable and increased to more than 95% again.
66
- We then launched a new run on cuda:3, increasing the batch size from 1,000 to 10,000, and started monitoring GPU and CPU utilization.
77
As a result:
88
GPU memory usage increased 5×.

0 commit comments

Comments
 (0)