Open
Description
Hello,
more than a feature request this is an advice request.
I have to train a tabular model on a huge dataset (more than 10 million rows) and I am not able to fit it entirely into memory as a Dataframe.
I would like to use the entire dataset for train/test/val without using a subset, and I wanted to know how would you suggest to operate in this case.
An alternative I've considered is to have a custom dataloader that loads into memory only the requested batch given a list of ids, but I don't know where to start and what should I actually modify or implement.
Some help would really be appreciated, thank you!