Lightweight UTF-8-based string library with some modern improvements
The class utf::string describes a dynamic, contiguous storage of UTF-8-encoded characters set.
- Dynamic length;
- Methods chaining;
- Fixed "Unsigned size_type problem" —
utf::string::size_typeisptrdiff_t, unlike STL'ssize_t; - Non-owning inner type for viewing and iteration —
utf::string::view(alsoutf::string_view)... - ...and rights to view and change are completely divided between
strings andviews by design!
Download the library source;#includetheutf8string.hppfile in your C++ project;- Enjoy!
⚠️ Note that the library requires C++17 support
See more examples in the
sample.cppsource file
- Creating the string:
// Constructing via const char*
utf::string MyString1 { "Amazing chest ahead" };
// Using std::initializer_list with integral code points
auto MyString2 = utf::string::from_unicode({ 'L',0xf6,'w','e', 'L',0xe9,'o','p','a','r','d' });
// From a vector of bytes (already encoded in UTF-8)
auto MyString3 = utf::string::from_bytes({ 'B','y','t','e','s' });
// As multiple copies of the character
utf::string MyString4 { 0xA2, 10 }; // == "¢¢¢¢¢¢¢¢¢¢"
// From an std::string
auto MyString5 = utf::string::from_std_string("Evil is evil");- Iterating over the characters:
utf::string Line { "Il buono, il brutto, il cattivo" };
// Using C++20 init-for
for (auto view = Line.chars(); auto ch : view)
{
std::cout << ch << std::endl; // prints chars' code points
}- Chaining:
utf::string Line { "Mr Dursley was the director of a firm called Grunnings" };
// Remove all spaces
Line.clone().remove(' ');
/* or */
Line.clone().remove_if(utf::is_space /* handles over 20 different Unicode spaces */ );
// Cut the last word off
std::cout <<
Line.first(Line.chars().reverse().find_if(utf::is_space).as_forward_index()).to_string();
// ↑ ↑
// no need to clone here — just operating with the view and actually clone here- Multi-pattern operations:
utf::string Line { "Stumbling everywhere" };
// Searches all occurences of every pattern in the parameter pack
auto all_matches = Line.chars().matches("every", "everywhere", "around");
// ^^^^ - for substring-matching version the type is std::vector<view>
for (auto& vi : all_matches)
{
std::cout << vi << std::endl; // prints "every", "everywhere"
}
// Removes the longest found substrings
std::cout << Line.remove("every", "everywhere"); // prints "Stumbling ", not "Stumbling where"- Access to the:
- Single character:
front(),back()— constant / O(1)- N-th (
get(N)) — linear / O(N) - Back character with removal (
pop()) — constant / O(1)
- Substring's view (by
chars(...),first(...),last(...)) — linear / O(N) - Entire string's view (
chars()) — constant / O(1)
- Single character:
- Insertion — linear / O(N); requires extra memory reallocation
- Search (
find*(...),contains*(...),count*(...)) / erasure (erase(...),remove*(...)) — linear / O(N) - Length calculation — linear / O(N) as it requires iteration over every character in the string
Note that a replacement (replace*(...)) is more complicated. It behaves like an insertion if the new substring is longer (by its size()) than the replacement. Otherwise, the operation does not requires an extra memory and behaves like an erasure; both cases have linear / O(N) time complexity.
See the LICENSE file for license rights and limitations (MIT).
