-
Notifications
You must be signed in to change notification settings - Fork 1
Added rayon feature #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -11,6 +11,15 @@ keywords = ["string", "slice", "substring", "unicode", "utf-8"] | |
categories = ["no-std", "algorithms", "parsing", "rust-patterns", "text-processing"] | ||
|
||
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html | ||
[dependencies] | ||
rayon_ = { package = "rayon", version = "1.10.0", optional = true } | ||
|
||
[features] | ||
std = [] | ||
rayon = ["std"] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I know As far as I am aware we should be able to keep There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I added the std feature for two reasons
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, sure, makes sense I guess. In that case should we also make There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't know, that's on you! There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think adding If we need to add it in future I can do that in a |
||
|
||
[package.metadata.docs.rs] | ||
features = ["std", "rayon"] | ||
|
||
[dev-dependencies.cargo-husky] | ||
version = "1" | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -60,6 +60,19 @@ use stringslice::StringSlice; | |
assert_eq!("string".try_slice(4..2), None); | ||
``` | ||
|
||
## Run in parallel | ||
|
||
You can have access to parallelized methods by enabling the `rayon` feature. Thanks to the [rayon](https://github.com/rayon-rs/rayon) crate, the string slicing will execute through many threads. | ||
|
||
**Par**allel methods: | ||
- `par_slice` | ||
- `par_try_slice` | ||
- `par_substring` | ||
- `par_try_substring` | ||
|
||
> [!WARNING] | ||
> Using the **par**allel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes. | ||
|
||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you have any benchmark info you could add here? I think it would be useful to know how big a string needs to be for the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'll try to provide some, yes! |
||
## Licence | ||
|
||
Licensed under either of | ||
|
Original file line number | Diff line number | Diff line change | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
@@ -42,9 +42,18 @@ | |||||||||||||
//! | ||||||||||||||
|
||||||||||||||
#![no_std] | ||||||||||||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If the
Suggested change
|
||||||||||||||
|
||||||||||||||
#![cfg(not(feature = "std"))] | ||||||||||||||
use core::ops::{Bound, RangeBounds}; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "std")] | ||||||||||||||
use std::ops::{Bound, RangeBounds}; | ||||||||||||||
Comment on lines
+45
to
+49
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
You should be able to use There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The rust docs say that it's better to use std instead of core, but it doesn't elaborate on that There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As a compromise you should be able to do something like this to avoid duplicate imports:
|
||||||||||||||
|
||||||||||||||
#[cfg(feature = "std")] | ||||||||||||||
extern crate std; | ||||||||||||||
Comment on lines
+50
to
+52
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Suggested change
This shouldn't be necessary There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. First I did like #2 (comment) But then I encountered some problems with tests, so that's how I figured it out. Maybe there's a better solution There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ah, yeah, for tests you need You should do something like |
||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
use rayon_::prelude::*; | ||||||||||||||
|
||||||||||||||
#[inline] | ||||||||||||||
fn range_to_begin_end(range: impl RangeBounds<usize>) -> (usize, usize) { | ||||||||||||||
let begin = match range.start_bound() { | ||||||||||||||
|
@@ -57,6 +66,7 @@ fn range_to_begin_end(range: impl RangeBounds<usize>) -> (usize, usize) { | |||||||||||||
Bound::Included(&b) => b + 1, | ||||||||||||||
Bound::Excluded(&b) => b, | ||||||||||||||
// Note: using core::usize::MAX rather than usize::MAX for compatibility with Rust < 1.43 | ||||||||||||||
#[allow(clippy::legacy_numeric_constants)] | ||||||||||||||
Bound::Unbounded => core::usize::MAX, | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
|
@@ -125,6 +135,74 @@ pub trait StringSlice { | |||||||||||||
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html | ||||||||||||||
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None | ||||||||||||||
fn try_substring(&self, begin: usize, end: usize) -> Option<&str>; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
/// Returns a string slice for the given range of characters | ||||||||||||||
/// | ||||||||||||||
/// This method will panic if the range is invalid, | ||||||||||||||
/// for example if the beginning is greater than the end. | ||||||||||||||
/// | ||||||||||||||
/// Runs in parallel | ||||||||||||||
Julien-cpsn marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
/// | ||||||||||||||
/// # Examples | ||||||||||||||
/// ``` | ||||||||||||||
/// use stringslice::StringSlice; | ||||||||||||||
/// | ||||||||||||||
/// assert_eq!("Ùníc😎de".slice(4..5), "😎"); | ||||||||||||||
/// ``` | ||||||||||||||
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
/// Returns an [`Option`] containing string slice for the given range of characters | ||||||||||||||
/// | ||||||||||||||
/// This method will return [`None`] if the range is invalid, | ||||||||||||||
/// for example if the beginning is greater than the end. | ||||||||||||||
/// | ||||||||||||||
/// Runs in parallel | ||||||||||||||
Julien-cpsn marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
/// | ||||||||||||||
/// # Examples | ||||||||||||||
/// ``` | ||||||||||||||
/// use stringslice::StringSlice; | ||||||||||||||
/// | ||||||||||||||
/// assert_eq!("Ùníc😎de".try_slice(4..5), Some("😎")); | ||||||||||||||
/// ``` | ||||||||||||||
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html | ||||||||||||||
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None | ||||||||||||||
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str>; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
/// Returns a string slice between the given beginning and end characters | ||||||||||||||
/// | ||||||||||||||
/// This method will panic if the parameters are invalid, | ||||||||||||||
/// for example if the beginning is greater than the end. | ||||||||||||||
/// | ||||||||||||||
/// Runs in parallel | ||||||||||||||
Julien-cpsn marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
/// | ||||||||||||||
/// # Examples | ||||||||||||||
/// ``` | ||||||||||||||
/// use stringslice::StringSlice; | ||||||||||||||
/// | ||||||||||||||
/// assert_eq!("Ùníc😎de".substring(4, 5), "😎"); | ||||||||||||||
/// ``` | ||||||||||||||
fn par_substring(&self, begin: usize, end: usize) -> &str; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
/// Returns an [`Option`] containing string slice between the given beginning and end characters | ||||||||||||||
/// | ||||||||||||||
/// This method will return [`None`] if the parameters are invalid, | ||||||||||||||
/// for example if the beginning is greater than the end. | ||||||||||||||
/// | ||||||||||||||
/// Runs in parallel | ||||||||||||||
Julien-cpsn marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
/// | ||||||||||||||
/// # Examples | ||||||||||||||
/// ``` | ||||||||||||||
/// use stringslice::StringSlice; | ||||||||||||||
/// | ||||||||||||||
/// assert_eq!("Ùníc😎de".try_substring(4, 5), Some("😎")); | ||||||||||||||
/// ``` | ||||||||||||||
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html | ||||||||||||||
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None | ||||||||||||||
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str>; | ||||||||||||||
Comment on lines
+139
to
+205
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Rather than adding these methods to the existing There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That could be great yes |
||||||||||||||
} | ||||||||||||||
|
||||||||||||||
impl StringSlice for str { | ||||||||||||||
|
@@ -160,6 +238,49 @@ impl StringSlice for str { | |||||||||||||
begin_ch | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
// Note (unsafe): Since we iterate character indices we can be sure that `begin_ch` and | ||||||||||||||
// `end_ch` are on UTF-8 boundaries. For performance, we use get_unchecked rather than | ||||||||||||||
// simply indexing. | ||||||||||||||
unsafe { Some(self.get_unchecked(begin_ch..end_ch)) } | ||||||||||||||
} | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
#[inline] | ||||||||||||||
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str { | ||||||||||||||
let (begin, end) = range_to_begin_end(range); | ||||||||||||||
self.par_substring(begin, end) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
#[inline] | ||||||||||||||
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str> { | ||||||||||||||
let (begin, end) = range_to_begin_end(range); | ||||||||||||||
self.par_try_substring(begin, end) | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
#[inline] | ||||||||||||||
fn par_substring(&self, begin: usize, end: usize) -> &str { | ||||||||||||||
self.par_try_substring(begin, end) | ||||||||||||||
.expect("begin < end when slicing string") | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str> { | ||||||||||||||
if begin > end { | ||||||||||||||
None | ||||||||||||||
} else { | ||||||||||||||
let mut ch_idx = self.par_char_indices().map(|(i, _c)| i); | ||||||||||||||
|
||||||||||||||
let len = self.len(); | ||||||||||||||
let begin_ch = ch_idx.nth(begin).unwrap_or(len); | ||||||||||||||
let end_ch = if end > begin { | ||||||||||||||
ch_idx.nth(end - begin - 1).unwrap_or(len) | ||||||||||||||
} else { | ||||||||||||||
begin_ch | ||||||||||||||
}; | ||||||||||||||
|
||||||||||||||
// Note (unsafe): Since we iterate character indices we can be sure that `begin_ch` and | ||||||||||||||
// `end_ch` are on UTF-8 boundaries. For performance we use get_unchecked rather than | ||||||||||||||
// simply indexing. | ||||||||||||||
|
@@ -170,8 +291,12 @@ impl StringSlice for str { | |||||||||||||
|
||||||||||||||
#[cfg(test)] | ||||||||||||||
mod tests { | ||||||||||||||
#[cfg(not(feature = "std"))] | ||||||||||||||
use core::ops::Bound; | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "std")] | ||||||||||||||
use std::ops::Bound; | ||||||||||||||
staticintlucas marked this conversation as resolved.
Show resolved
Hide resolved
|
||||||||||||||
|
||||||||||||||
use super::StringSlice; | ||||||||||||||
|
||||||||||||||
#[test] | ||||||||||||||
|
@@ -218,4 +343,13 @@ mod tests { | |||||||||||||
"str" | ||||||||||||||
); | ||||||||||||||
} | ||||||||||||||
|
||||||||||||||
#[cfg(feature = "rayon")] | ||||||||||||||
#[test] | ||||||||||||||
fn par_test_utf8() { | ||||||||||||||
let str = "🗻∈🌏"; | ||||||||||||||
assert_eq!("🗻", str.par_slice(0..1)); | ||||||||||||||
assert_eq!("∈", str.par_slice(1..2)); | ||||||||||||||
assert_eq!("🌏", str.par_slice(2..3)); | ||||||||||||||
} | ||||||||||||||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can I ask why you're using
rayon_
rather thanrayon
here?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because dependencies and features can conflict, so you can't name both rayon.
I personally think it's preferable for the user to use
rayon
and notrayon_
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be possible to do something like this since Rust creates an implicit feature for optional dependencies.
In Rust 1.60+ you can also use the
dep:
syntax, but that requires bumping our MSRV