Skip to content

Commit a5c2229

Browse files
authored
fix ZeroDivisionError while only one train data (#810)
1 parent 192022f commit a5c2229

File tree

1 file changed

+4
-1
lines changed

1 file changed

+4
-1
lines changed

megatron/data/data_utils.py

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -249,7 +249,7 @@ def build_weighted_datasets(
249249
return train_datasets, valid_datasets, test_datasets
250250

251251

252-
def weights_by_num_docs(l, alpha=0.3):
252+
def weights_by_num_docs(l: list, alpha=0.3):
253253
"""
254254
Builds weights from a multinomial distribution over groups of data according to the number of
255255
samples in each group.
@@ -263,6 +263,9 @@ def weights_by_num_docs(l, alpha=0.3):
263263
264264
See https://arxiv.org/abs/1911.02116 for more details
265265
"""
266+
if len(l) == 1:
267+
return [1.0]
268+
266269
total_n_docs = sum(l)
267270
unbiased_sample_probs = [i / total_n_docs for i in l]
268271

0 commit comments

Comments
 (0)