Skip to content

Commit 8ad0706

Browse files
Update experiment_issues.md
1 parent 7ffb801 commit 8ad0706

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

docs/e2e_journal_experiment/experiment_issues.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Experiments on different [variations of fnn](https://docs.google.com/spreadsheets/d/1jt4Pvdz58qs0LyAnSjYr0MTpzy5ZfNonw5bBtfbIvKY/edit?gid=212563191#gid=212563191) with mhot vector as an input:
22
**training phase only**
33
- We initially ran 3 models on cuda:1 and 3 models on cuda:3, resulting in a training time of ~3 hours per epoch.
4-
- Running 6 models in parallel led to CPU usage exceeding 97% across all 224 cores.
4+
- Running 6 models in parallel led to CPU usage [exceeding 97%](https://github.com/mahdis-saeedi/OpeNTF/blob/main/docs/e2e_journal_experiment/cpu%26gpu_usage/cpu_6models_2gpus.txt) across all 224 cores.
55
- We terminated 3 runs on cuda:3, which reduced CPU usage to ~7% per core, but it was not stable and increased to more than 95% again.
66
- We then launched a new run on cuda:3, increasing the batch size from 1,000 to 10,000, and started monitoring GPU and CPU utilization.
77
As a result:

0 commit comments

Comments
 (0)