-
Notifications
You must be signed in to change notification settings - Fork 6.3k
[Data] support num_cpus, memory, concurrency, batch_size for preprocess #52574
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Data] support num_cpus, memory, concurrency, batch_size for preprocess #52574
Conversation
cc @richardliaw to review this change set. Please note that I only update the Tokenizer with above arguments, if we are good with this solution, I will update other subclass of preprocess. |
OK, thinking through this a little more. I think maybe better if we just expose:
And avoid exposing This will be simpler to maintain and will allow it to be easily serializable |
Make sense, will update code accordingly. thanks! |
Another thought -
1. This should only be for transform(), not init
2. for fit_transform() we can add “transform_num_cpus” or something
…On Fri, Apr 25, 2025 at 8:13 PM Xingyu Long ***@***.***> wrote:
*xingyu-long* left a comment (ray-project/ray#52574)
<#52574 (comment)>
OK, thinking through this a little more. I think maybe better if we just
expose:
- num_cpus
- memory
- batch_size
- concurrency
And avoid exposing ray_remote_args and ray_remote_args_fn generally.
This will be simpler to maintain and will allow it to be easily
serializable
Make sense, will update code accordingly. thanks!
—
Reply to this email directly, view it on GitHub
<#52574 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCRZZMEUURY3MWILDRJG3323L2VFAVCNFSM6AAAAAB3X7PFZWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDQMZRG44DSNRVGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
sure, we can do this too. just updated it. also I was wondering if we should remove function |
9900995
to
7caf5af
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice work, left some comments
Thanks for reviewing it, @richardliaw, I just updated the changes. |
857f6a1
to
f0dcb1d
Compare
with this change (f52f057), fixed the test_chain test and also tested it all preprocessor tests. $ pytest -v -s python/ray/data/tests/preprocessors/
Results (25.86s):
119 passed |
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
f52f057
to
831466f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
1 small nit
oh i just realized, this looks good - you're planning to extend this to all preprocessors right? |
That was the idea (with constructor approach I tried), however, since we modified common functions in parent class, so we don't have to touch them any more. I've ran all tests under preprocess test dir. let me know if I am missing any other tests. btw, I will fix the above comments later today. Thanks! |
Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
…ss (ray-project#52574) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? As title, we'd like to support above parameters for preprocess <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number Close ray-project#52448 <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
…ss (#52574) <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? As title, we'd like to support above parameters for preprocess <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number Close #52448 <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: Xingyu Long <xingyulong97@gmail.com>
Why are these changes needed?
As title, we'd like to support above parameters for preprocess
Related issue number
Close #52448
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/
under thecorresponding
.rst
file.