@@ -69,7 +69,7 @@ For language modeling, we have these data-sets in T2T:
69
69
* LM1B (a billion-word corpus): ` --problems=languagemodel_lm1b32k ` for
70
70
subword-level modeling and ` --problems=languagemodel_lm1b_characters `
71
71
for character-level modeling.
72
-
72
+
73
73
We suggest to start with ` --model=transformer ` on this task and use
74
74
` --hparams_set=transformer_small ` for PTB and
75
75
` --hparams_set=transformer_base ` for LM1B.
@@ -95,7 +95,7 @@ For speech-to-text, we have these data-sets in T2T:
95
95
For summarizing longer text into shorter one we have these data-sets:
96
96
* CNN/DailyMail articles summarized into a few sentences:
97
97
` --problems=summarize_cnn_dailymail32k `
98
-
98
+
99
99
We suggest to use ` --model=transformer ` and
100
100
` --hparams_set=transformer_prepend ` for this task.
101
101
This yields good ROUGE scores.
@@ -118,5 +118,5 @@ For all translation problems, we suggest to try the Transformer model:
118
118
this should reach a BLEU score of about 28 on the English-German data-set,
119
119
which is close to state-of-the art. If training on a single GPU, try the
120
120
` --hparams_set=transformer_base_single_gpu ` setting. For very good results
121
- or larger data-sets (e.g., for English-French)m , try the big model
121
+ or larger data-sets (e.g., for English-French), try the big model
122
122
with ` --hparams_set=transformer_big ` .
0 commit comments