Skip to content

Added rayon feature #2

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,15 @@ keywords = ["string", "slice", "substring", "unicode", "utf-8"]
categories = ["no-std", "algorithms", "parsing", "rust-patterns", "text-processing"]

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
rayon_ = { package = "rayon", version = "1.10.0", optional = true }
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can I ask why you're using rayon_ rather than rayon here?

Copy link
Author

@Julien-cpsn Julien-cpsn Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because dependencies and features can conflict, so you can't name both rayon.

I personally think it's preferable for the user to use rayon and not rayon_

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[dependencies]
rayon = { ..., optional = true }

[features]
std = []
rayon = ["std"]

It should be possible to do something like this since Rust creates an implicit feature for optional dependencies.

In Rust 1.60+ you can also use the dep: syntax, but that requires bumping our MSRV


[features]
std = []
rayon = ["std"]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know rayon requires std, but do we also need a std feature in this stringslice?

As far as I am aware we should be able to keep stringslice as no_std even if rayon uses std.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added the std feature for two reasons

  • so that the user is aware that adding rayon also adds std
  • it's already present if needed in the future

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, sure, makes sense I guess.

In that case should we also make std a default feature?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, that's on you!

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adding std to default would be a breaking change, so let's not do that now.

If we need to add it in future I can do that in a 2.0 release.


[package.metadata.docs.rs]
features = ["std", "rayon"]

[dev-dependencies.cargo-husky]
version = "1"
Expand Down
13 changes: 13 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,6 +60,19 @@ use stringslice::StringSlice;
assert_eq!("string".try_slice(4..2), None);
```

## Run in parallel

You can have access to parallelized methods by enabling the `rayon` feature. Thanks to the [rayon](https://github.com/rayon-rs/rayon) crate, the string slicing will execute through many threads.

**Par**allel methods:
- `par_slice`
- `par_try_slice`
- `par_substring`
- `par_try_substring`

> [!WARNING]
> Using the **par**allel methods on bigger strings is recommended. Parallelism scales greatly on bigger sizes.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you have any benchmark info you could add here? I think it would be useful to know how big a string needs to be for the par_* functions to be faster.

Copy link
Author

@Julien-cpsn Julien-cpsn Aug 31, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to provide some, yes!

## Licence

Licensed under either of
Expand Down
136 changes: 135 additions & 1 deletion src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,18 @@
//!

#![no_std]
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the std feature is necessary you should use

Suggested change
#![no_std]
#![cfg_attr(not(feature = "std"), no_std)]


#![cfg(not(feature = "std"))]
use core::ops::{Bound, RangeBounds};

#[cfg(feature = "std")]
use std::ops::{Bound, RangeBounds};
Comment on lines +45 to +49
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#![cfg(not(feature = "std"))]
use core::ops::{Bound, RangeBounds};
#[cfg(feature = "std")]
use std::ops::{Bound, RangeBounds};
use core::ops::{Bound, RangeBounds};

You should be able to use core::* even with std

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rust docs say that it's better to use std instead of core, but it doesn't elaborate on that

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a compromise you should be able to do something like this to avoid duplicate imports:

#[cfg(not(feature = "std"))
use core as std;

use std::whatever;


#[cfg(feature = "std")]
extern crate std;
Comment on lines +50 to +52
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
#[cfg(feature = "std")]
extern crate std;

This shouldn't be necessary

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First I did like #2 (comment)

But then I encountered some problems with tests, so that's how I figured it out. Maybe there's a better solution

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, yeah, for tests you need std.

You should do something like #[cfg_attr(not(any(feature = "std", test)), no_std)] to enable std for tests


#[cfg(feature = "rayon")]
use rayon_::prelude::*;

#[inline]
fn range_to_begin_end(range: impl RangeBounds<usize>) -> (usize, usize) {
let begin = match range.start_bound() {
Expand All @@ -57,6 +66,7 @@ fn range_to_begin_end(range: impl RangeBounds<usize>) -> (usize, usize) {
Bound::Included(&b) => b + 1,
Bound::Excluded(&b) => b,
// Note: using core::usize::MAX rather than usize::MAX for compatibility with Rust < 1.43
#[allow(clippy::legacy_numeric_constants)]
Bound::Unbounded => core::usize::MAX,
};

Expand Down Expand Up @@ -125,6 +135,74 @@ pub trait StringSlice {
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
fn try_substring(&self, begin: usize, end: usize) -> Option<&str>;

#[cfg(feature = "rayon")]
/// Returns a string slice for the given range of characters
///
/// This method will panic if the range is invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".slice(4..5), "😎");
/// ```
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str;

#[cfg(feature = "rayon")]
/// Returns an [`Option`] containing string slice for the given range of characters
///
/// This method will return [`None`] if the range is invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".try_slice(4..5), Some("😎"));
/// ```
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str>;

#[cfg(feature = "rayon")]
/// Returns a string slice between the given beginning and end characters
///
/// This method will panic if the parameters are invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".substring(4, 5), "😎");
/// ```
fn par_substring(&self, begin: usize, end: usize) -> &str;

#[cfg(feature = "rayon")]
/// Returns an [`Option`] containing string slice between the given beginning and end characters
///
/// This method will return [`None`] if the parameters are invalid,
/// for example if the beginning is greater than the end.
///
/// Runs in parallel
///
/// # Examples
/// ```
/// use stringslice::StringSlice;
///
/// assert_eq!("Ùníc😎de".try_substring(4, 5), Some("😎"));
/// ```
/// [`Option`]: https://doc.rust-lang.org/std/option/enum.Option.html
/// [`None`]: https://doc.rust-lang.org/std/option/enum.Option.html#variant.None
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str>;
Comment on lines +139 to +205
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than adding these methods to the existing StringSlice trait, do you think it would be cleaner to create a new ParStringSlice trait for these?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That could be great yes

}

impl StringSlice for str {
Expand Down Expand Up @@ -160,6 +238,49 @@ impl StringSlice for str {
begin_ch
};

// Note (unsafe): Since we iterate character indices we can be sure that `begin_ch` and
// `end_ch` are on UTF-8 boundaries. For performance, we use get_unchecked rather than
// simply indexing.
unsafe { Some(self.get_unchecked(begin_ch..end_ch)) }
}
}

#[cfg(feature = "rayon")]
#[inline]
fn par_slice(&self, range: impl RangeBounds<usize>) -> &str {
let (begin, end) = range_to_begin_end(range);
self.par_substring(begin, end)
}

#[cfg(feature = "rayon")]
#[inline]
fn par_try_slice(&self, range: impl RangeBounds<usize>) -> Option<&str> {
let (begin, end) = range_to_begin_end(range);
self.par_try_substring(begin, end)
}

#[cfg(feature = "rayon")]
#[inline]
fn par_substring(&self, begin: usize, end: usize) -> &str {
self.par_try_substring(begin, end)
.expect("begin < end when slicing string")
}

#[cfg(feature = "rayon")]
fn par_try_substring(&self, begin: usize, end: usize) -> Option<&str> {
if begin > end {
None
} else {
let mut ch_idx = self.par_char_indices().map(|(i, _c)| i);

let len = self.len();
let begin_ch = ch_idx.nth(begin).unwrap_or(len);
let end_ch = if end > begin {
ch_idx.nth(end - begin - 1).unwrap_or(len)
} else {
begin_ch
};

// Note (unsafe): Since we iterate character indices we can be sure that `begin_ch` and
// `end_ch` are on UTF-8 boundaries. For performance we use get_unchecked rather than
// simply indexing.
Expand All @@ -170,8 +291,12 @@ impl StringSlice for str {

#[cfg(test)]
mod tests {
#[cfg(not(feature = "std"))]
use core::ops::Bound;

#[cfg(feature = "std")]
use std::ops::Bound;

use super::StringSlice;

#[test]
Expand Down Expand Up @@ -218,4 +343,13 @@ mod tests {
"str"
);
}

#[cfg(feature = "rayon")]
#[test]
fn par_test_utf8() {
let str = "🗻∈🌏";
assert_eq!("🗻", str.par_slice(0..1));
assert_eq!("∈", str.par_slice(1..2));
assert_eq!("🌏", str.par_slice(2..3));
}
}