Add head_dim = 64 in B200 Attention. (#4935) #16749
Job | Run time |
---|---|
6s | |
8s | |
2h 30m 32s | |
31m 6s | |
2h 7m 48s | |
3h 0m 41s | |
32m 22s | |
2h 5m 32s | |
3h 1m 35s | |
2h 31m 29s | |
2h 10m 3s | |
2h 28m 11s | |
2h 6m 0s | |
2h 1m 15s | |
2h 20m 55s | |
3h 0m 33s | |
2h 28m 44s | |
29m 11s | |
25m 58s | |
3h 0m 37s | |
25m 56s | |
29m 34s | |
2h 24m 41s | |
2h 4m 0s | |
2h 18m 35s | |
2h 7m 9s | |
2h 2m 11s | |
3h 0m 33s | |
2h 25m 42s | |
3h 0m 16s | |
11s | |
43s | |
23s | |
33s | |
15s | |
30s | |
25s | |
13s | |
21s | |
22s | |
24s | |
25s | |
38s | |
19s | |
12s | |
40s | |
27s | |
55s | |
31s | |
22s | |
18s | |
18s | |
32s | |
17s | |
1m 6s | |
16s | |
23s | |
24s | |
2d 9h 23m 46s |