Skip to content

Update items in the dataset without map #7520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
mashdragon opened this issue Apr 15, 2025 · 1 comment
Open

Update items in the dataset without map #7520

mashdragon opened this issue Apr 15, 2025 · 1 comment
Labels
enhancement New feature or request

Comments

@mashdragon
Copy link

Feature request

I would like to be able to update items in my dataset without affecting all rows. At least if there was a range option, I would be able to process those items, save the dataset, and then continue.

If I am supposed to split the dataset first, that is not clear, since the docs suggest that any of those functions returns a new object, so I don't think I can do that.

Motivation

I am applying an extremely time-consuming function to each item in my Dataset. Unfortunately, datasets only supports updating values via map, so if my computer dies in the middle of this long-running process, I lose all progress. This is far from ideal. I would like to use datasets throughout this processing, but this limitation is now forcing me to write my own dataset format just to do this intermediary operation.

It would be less intuitive but I suppose I could split and then concatenate the dataset before saving? But this feels very inefficient.

Your contribution

I can test the feature.

@mashdragon mashdragon added the enhancement New feature or request label Apr 15, 2025
@Dref360
Copy link
Contributor

Dref360 commented Apr 19, 2025

Hello!

Have you looked at Dataset.shard? Docs

Using this method you could break your dataset in N shards. Apply map on each shard and concatenate them back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants