-
Notifications
You must be signed in to change notification settings - Fork 3.6k
Open
Labels
bugSomething isn't workingSomething isn't workingdata handlingGeneric data-related topicGeneric data-related topic
Milestone
Description
Bug description
This warning shows up when running a LitData StreamingDataset with Trainer:
/home/zeus/miniconda3/envs/cloudspace/lib/python3.10/site-packages/lightning/pytorch/utilities/data.py:122: Your `IterableDataset` has `__len__` defined. In combination with multi-process data loading (when num_workers > 1), `__len__` could be inaccurate if each worker is not configured independently to avoid having duplicate data.
What version are you seeing the problem on?
v2.4, master
How to reproduce the bug
import torch
import litgpt
from litgpt import GPT
from litgpt.pretrain import initialize_weights
from litdata.streaming import StreamingDataLoader, StreamingDataset, TokensLoader
import lightning as L
class LitLLM(L.LightningModule):
def __init__(self):
super().__init__()
self.model = GPT.from_name(name="micro-llama-300M")
def on_train_start(self):
initialize_weights(self.trainer, self.model, n_layer=self.model.config.n_layer, n_embd=self.model.config.n_embd)
def training_step(self, batch):
input_ids = batch.long()
logits = self.model(input_ids)
loss = litgpt.utils.chunked_cross_entropy(logits[..., :-1, :], input_ids[..., 1:])
self.log("train_loss", loss, prog_bar=True)
return loss
def configure_optimizers(self):
warmup_steps = 500
optimizer = torch.optim.AdamW(self.model.parameters(), lr=4e-4, weight_decay=0.1, betas=(0.9, 0.95))
scheduler = torch.optim.lr_scheduler.LambdaLR(optimizer, lambda step: min(step / warmup_steps, 1.0))
return {"optimizer": optimizer, "lr_scheduler": {"scheduler": scheduler, "interval": "step"}}
if __name__ == "__main__":
train_dataset = StreamingDataset("s3://tinyllama-template/slimpajama/train", item_loader=TokensLoader(block_size=128))
train_dataloader = StreamingDataLoader(train_dataset, shuffle=True, batch_size=12, num_workers=1)
trainer = L.Trainer(
max_epochs=1,
accumulate_grad_batches=4,
precision="bf16-mixed",
)
with trainer.init_module(empty_init=True):
model = LitLLM()
trainer.fit(model, train_dataloader)
Error messages and logs
None
Environment
Current environment
#- PyTorch Lightning Version (e.g., 2.4.0):
#- PyTorch Version (e.g., 2.4):
#- Python version (e.g., 3.12):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
More info
No response
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingdata handlingGeneric data-related topicGeneric data-related topic