Skip to content
This repository was archived by the owner on Jul 7, 2023. It is now read-only.

Commit bac1321

Browse files
author
Ryan Sepassi
committed
v1.4, rm unused code and codepaths
PiperOrigin-RevId: 179822701
1 parent 4354f3b commit bac1321

27 files changed

+723
-1898
lines changed

.travis.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -14,9 +14,9 @@ env:
1414
- T2T_DATA_DIR=/tmp/t2t-data
1515
- T2T_TRAIN_DIR=/tmp/t2t-train
1616
script:
17-
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/utils/trainer_utils_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/tpu/tpu_trainer_lib_test.py
17+
- pytest --ignore=tensor2tensor/utils/registry_test.py --ignore=tensor2tensor/problems_test.py --ignore=tensor2tensor/tpu/tpu_trainer_lib_test.py
1818
- pytest tensor2tensor/utils/registry_test.py
19-
- pytest tensor2tensor/utils/trainer_utils_test.py
19+
- pytest tensor2tensor/tpu/tpu_trainer_lib_test.py
2020
- t2t-datagen 2>&1 | grep translate && echo passed
2121
- python -c "from tensor2tensor.models import transformer; print(transformer.Transformer.__name__)"
2222
- t2t-trainer --registry_help

docs/cloud_tpu.md

Lines changed: 28 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -3,15 +3,19 @@
33
Tensor2Tensor supports running on Google Cloud Platforms TPUs, chips specialized
44
for ML training.
55

6-
Not all models are supported but we've tested so far with Transformer (sequence
7-
model) as well as Xception (image model).
6+
Models and hparams that are known to work on TPU:
7+
* `transformer` with `transformer_tpu`
8+
* `xception` with `xception_base`
9+
* `resnet50` with `resnet_base`
810

911
To run on TPUs, you need to be part of the alpha program; if you're not, these
1012
commands won't work for you currently, but access will expand soon, so get
1113
excited for your future ML supercomputers in the cloud.
1214

1315
## Tutorial: Transformer En-De translation on TPU
1416

17+
Update `gcloud`: `gcloud components update`
18+
1519
Set your default zone to a TPU-enabled zone. TPU machines are only available in
1620
certain zones for now.
1721
```
@@ -40,29 +44,32 @@ gcloud alpha compute tpus create \
4044
To see all TPU instances running: `gcloud alpha compute tpus list`. The
4145
`TPU_IP` should be unique amongst the list and follow the format `10.240.i.2`.
4246

43-
Generate data to GCS
44-
If you already have the data locally, use `gsutil cp` to cp to GCS.
47+
SSH in with port forwarding for TensorBoard
4548
```
46-
DATA_DIR=gs://my-bucket/t2t/data/
47-
t2t-datagen --problem=translate_ende_wmt8k --data_dir=$DATA_DIR
49+
gcloud compute ssh $USER-vm -- -L 6006:localhost:6006
4850
```
4951

50-
SSH in with port forwarding for TensorBoard
52+
Now that you're on the cloud instance, install T2T:
5153
```
52-
gcloud compute ssh $USER-vm -L 6006:localhost:6006
54+
pip install tensor2tensor --user
55+
# If your python bin dir isn't already in your path
56+
export PATH=$HOME/.local/bin:$PATH
5357
```
5458

55-
Now that you're on the cloud instance, install T2T:
59+
Generate data to GCS
60+
If you already have the data, use `gsutil cp` to copy to GCS.
5661
```
57-
pip install tensor2tensor
62+
GCS_BUCKET=gs://my-bucket
63+
DATA_DIR=$GCS_BUCKET/t2t/data/
64+
t2t-datagen --problem=translate_ende_wmt8k --data_dir=$DATA_DIR
5865
```
5966

6067
Setup some vars used below. `TPU_IP` and `DATA_DIR` should be the same as what
6168
was used above. Note that the `DATA_DIR` and `OUT_DIR` must be GCS buckets.
6269
```
6370
TPU_IP=<IP of TPU machine>
64-
DATA_DIR=gs://my-bucket/t2t/data/
65-
OUT_DIR=gs://my-bucket/t2t/training/
71+
DATA_DIR=$GCS_BUCKET/t2t/data/
72+
OUT_DIR=$GCS_BUCKET/t2t/training/
6673
TPU_MASTER=grpc://$TPU_IP:8470
6774
```
6875

@@ -73,25 +80,26 @@ tensorboard --logdir=$OUT_DIR > /tmp/tensorboard_logs.txt 2>&1 &
7380

7481
Train and evaluate.
7582
```
76-
t2t-tpu-trainer \
77-
--master=$TPU_MASTER \
78-
--data_dir=$DATA_DIR \
79-
--output_dir=$OUT_DIR \
80-
--problems=translate_ende_wmt8k \
83+
t2t-trainer \
8184
--model=transformer \
82-
--hparams_set=transformer_tiny_tpu \
85+
--hparams_set=transformer_tpu \
86+
--problems=translate_ende_wmt8k \
8387
--train_steps=10 \
8488
--eval_steps=10 \
8589
--local_eval_frequency=10 \
86-
--iterations_per_loop=10
90+
--iterations_per_loop=10 \
91+
--master=$TPU_MASTER \
92+
--use_tpu=True \
93+
--data_dir=$DATA_DIR \
94+
--output_dir=$OUT_DIR
8795
```
8896

8997
The above command will train for 10 steps, then evaluate for 10 steps. You can
9098
(and should) increase the number of total training steps with the
9199
`--train_steps` flag. Evaluation will happen every `--local_eval_frequency`
92100
steps, each time for `--eval_steps`. When you increase then number of training
93101
steps, also increase `--iterations_per_loop`, which controls how frequently the
94-
TPU machine returns control to the Python code (1000 seems like a fine number).
102+
TPU machine returns control to the host machine (1000 seems like a fine number).
95103

96104
Back on your local machine, open your browser and navigate to `localhost:6006`
97105
for TensorBoard.

docs/example_life.md

Lines changed: 0 additions & 197 deletions
This file was deleted.

docs/index.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,6 +24,6 @@ documentation, from basic tutorials to full code documentation.
2424

2525
## Deep Dive
2626

27-
* [Life of an Example](example_life.md): how all parts of T2T are connected and
27+
* [System Overview](overview.md): how all parts of T2T are connected and
2828
work together
2929
* [Distributed Training](distributed_training.md)

0 commit comments

Comments
 (0)