Skip to content

Commit bd28f2d

Browse files
authored
Add extractEntities, smartSplit, and humanizeList text processing utilities (#35)
* new text processing functions * update version
1 parent 0b83567 commit bd28f2d

18 files changed

+1341
-185
lines changed

CHANGELOG.md

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,20 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
77

88
## [Unreleased]
99

10+
## [0.16.0] - 2025-09-22
11+
12+
### Added
13+
14+
- **New Functions** - Three powerful text processing utilities
15+
- `extractEntities()` - Extract emails, URLs, mentions, hashtags, phones, dates, and prices from text
16+
- `smartSplit()` - Intelligently split text into sentences with abbreviation handling
17+
- `humanizeList()` - Convert arrays to grammatically correct human-readable lists with Oxford comma support
18+
19+
### Changed
20+
21+
- **Bundle Size** - Increased size limit from 6.5KB to 7.5KB to accommodate new features
22+
- **Documentation** - Updated function count from 41 to 44 functions
23+
1024
## [0.15.0] - 2025-09-21
1125

1226
### Added
@@ -40,6 +54,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
4054
### Improved
4155

4256
- **CLI Tool** - Enhanced with multi-runtime support
57+
4358
- Automatic runtime detection (Node.js, Deno, or Bun)
4459
- Unified stdin handling across all runtimes
4560
- Cross-runtime path resolution and dynamic imports

README.md

Lines changed: 121 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -79,7 +79,7 @@ bun add nano-string-utils
7979
<script src="https://unpkg.com/nano-string-utils/dist/index.iife.js"></script>
8080

8181
<!-- Or specific version -->
82-
<script src="https://cdn.jsdelivr.net/npm/nano-string-utils@0.15.0/dist/index.iife.js"></script>
82+
<script src="https://cdn.jsdelivr.net/npm/nano-string-utils@0.16.0/dist/index.iife.js"></script>
8383

8484
<script>
8585
// All functions available on global nanoStringUtils object
@@ -92,7 +92,10 @@ For modern browsers with ES modules:
9292

9393
```html
9494
<script type="module">
95-
import { slugify, camelCase } from 'https://unpkg.com/nano-string-utils/dist/index.js';
95+
import {
96+
slugify,
97+
camelCase,
98+
} from "https://unpkg.com/nano-string-utils/dist/index.js";
9699
97100
console.log(slugify("Hello World")); // 'hello-world'
98101
console.log(camelCase("hello-world")); // 'helloWorld'
@@ -174,7 +177,7 @@ fuzzyMatch("usrctrl", "userController.js"); // { matched: true, score: 0.444 }
174177
fuzzyMatch("of", "openFile"); // { matched: true, score: 0.75 }
175178
```
176179

177-
> 📖 **See all 40+ functions in the API Reference below**
180+
> 📖 **See all 44 functions in the API Reference below**
178181
179182
## CLI
180183

@@ -252,6 +255,10 @@ nano-string padStart "hi" --length 5 --char "*" # ***hi
252255

253256
# Generate random strings
254257
nano-string randomString --length 10 # Generates 10-character string
258+
259+
# Text processing
260+
nano-string smartSplit "Dr. Smith went to the store. He bought milk." # ['Dr. Smith went to the store.', 'He bought milk.']
261+
nano-string humanizeList "apple,banana,orange" --conjunction "or" # apple, banana, or orange
255262
```
256263

257264
#### Validation functions
@@ -286,7 +293,7 @@ nano-string slugify --help
286293

287294
## API Reference
288295

289-
The library provides 40+ string utility functions organized by category. Click on any category to explore the available functions.
296+
The library provides 44 string utility functions organized by category. Click on any category to explore the available functions.
290297

291298
<details>
292299
<summary><b>🔤 Case Conversion Functions (10 functions)</b></summary>
@@ -533,11 +540,11 @@ hashString("world"); // 113318802
533540
</details>
534541

535542
<details>
536-
<summary><b>📝 Text Processing (8 functions)</b></summary>
543+
<summary><b>📝 Text Processing (11 functions)</b></summary>
537544

538545
### Text Processing
539546

540-
Advanced text processing utilities for handling HTML, whitespace, and special characters.
547+
Advanced text processing utilities for handling HTML, whitespace, special characters, and entity extraction.
541548

542549
#### `stripHtml(str: string): string`
543550

@@ -655,6 +662,109 @@ singularize("Boxes"); // 'Box'
655662
singularize("PEOPLE"); // 'PERSON'
656663
```
657664

665+
#### `extractEntities(text: string): ExtractedEntities`
666+
667+
Extracts various entities from text including emails, URLs, mentions, hashtags, phones, dates, and prices.
668+
669+
```javascript
670+
// Extract from mixed content
671+
const text =
672+
"Contact @john at john@example.com or call (555) 123-4567. Check #updates at https://example.com. Price: $99.99";
673+
const entities = extractEntities(text);
674+
// Returns:
675+
// {
676+
// emails: ['john@example.com'],
677+
// urls: ['https://example.com'],
678+
// mentions: ['@john'],
679+
// hashtags: ['#updates'],
680+
// phones: ['(555) 123-4567'],
681+
// dates: [],
682+
// prices: ['$99.99']
683+
// }
684+
685+
// Extract from social media content
686+
const tweet =
687+
"Hey @alice and @bob! Check out #javascript #typescript at https://github.com/example";
688+
const social = extractEntities(tweet);
689+
// social.mentions: ['@alice', '@bob']
690+
// social.hashtags: ['#javascript', '#typescript']
691+
// social.urls: ['https://github.com/example']
692+
693+
// Extract contact information
694+
const contact = "Email: support@company.com, Phone: +1-800-555-0100";
695+
const info = extractEntities(contact);
696+
// info.emails: ['support@company.com']
697+
// info.phones: ['+1-800-555-0100']
698+
699+
// Extract dates and prices
700+
const invoice = "Invoice Date: 2024-01-15, Due: 01/30/2024, Amount: $1,234.56";
701+
const billing = extractEntities(invoice);
702+
// billing.dates: ['2024-01-15', '01/30/2024']
703+
// billing.prices: ['$1,234.56']
704+
```
705+
706+
#### `smartSplit(text: string): string[]`
707+
708+
Intelligently splits text into sentences while properly handling abbreviations, ellipses, and decimal numbers.
709+
710+
```javascript
711+
// Basic sentence splitting
712+
smartSplit("Hello world. How are you? I'm fine!");
713+
// ['Hello world.', 'How are you?', "I'm fine!"]
714+
715+
// Handles abbreviations correctly
716+
smartSplit("Dr. Smith went to the store. He bought milk.");
717+
// ['Dr. Smith went to the store.', 'He bought milk.']
718+
719+
// Preserves decimal numbers
720+
smartSplit("The price is $10.50. That's expensive!");
721+
// ['The price is $10.50.', "That's expensive!"]
722+
723+
// Handles multiple abbreviations
724+
smartSplit(
725+
"Mr. and Mrs. Johnson live on St. Paul Ave. They moved from the U.S.A. last year."
726+
);
727+
// ['Mr. and Mrs. Johnson live on St. Paul Ave.', 'They moved from the U.S.A. last year.']
728+
729+
// Handles ellipses
730+
smartSplit("I was thinking... Maybe we should go. What do you think?");
731+
// ['I was thinking...', 'Maybe we should go.', 'What do you think?']
732+
```
733+
734+
#### `humanizeList(items: unknown[], options?: HumanizeListOptions): string`
735+
736+
Converts an array into a grammatically correct, human-readable list with proper conjunctions and optional Oxford comma.
737+
738+
```javascript
739+
// Basic usage
740+
humanizeList(["apple", "banana", "orange"]);
741+
// 'apple, banana, and orange'
742+
743+
// Two items
744+
humanizeList(["yes", "no"]);
745+
// 'yes and no'
746+
747+
// With custom conjunction
748+
humanizeList(["red", "green", "blue"], { conjunction: "or" });
749+
// 'red, green, or blue'
750+
751+
// Without Oxford comma
752+
humanizeList(["a", "b", "c"], { oxford: false });
753+
// 'a, b and c'
754+
755+
// With quotes
756+
humanizeList(["run", "jump", "swim"], { quotes: true });
757+
// '"run", "jump", and "swim"'
758+
759+
// Handles mixed types and nulls
760+
humanizeList([1, null, "text", undefined, true]);
761+
// '1, text, and true'
762+
763+
// Empty arrays
764+
humanizeList([]);
765+
// ''
766+
```
767+
658768
</details>
659769

660770
<details>
@@ -1278,9 +1388,12 @@ Each utility is optimized to be as small as possible:
12781388
| fuzzyMatch | ~500 bytes |
12791389
| pluralize | ~350 bytes |
12801390
| singularize | ~320 bytes |
1391+
| smartSplit | ~1.1KB |
1392+
| humanizeList | ~850 bytes |
12811393
| memoize | ~400 bytes |
1394+
| extractEntities | ~1.1KB |
12821395

1283-
Total package size: **< 6.5KB** minified + gzipped
1396+
Total package size: **< 7.5KB** minified + gzipped
12841397

12851398
## Requirements
12861399

@@ -1384,7 +1497,7 @@ npm run bench:size
13841497

13851498
| Library | Bundle Size | Dependencies | Tree-shakeable | TypeScript |
13861499
| ----------------- | ----------- | ------------ | --------------------- | ---------- |
1387-
| nano-string-utils | < 6.5KB | 0 |||
1500+
| nano-string-utils | < 7.5KB | 0 |||
13881501
| lodash | ~70KB | 0 | ⚠️ Requires lodash-es ||
13891502
| underscore.string | ~20KB | 0 |||
13901503
| voca | ~30KB | 0 |||

benchmarks/all-functions.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,12 @@
88
"dotCase",
99
"escapeHtml",
1010
"excerpt",
11+
"extractEntities",
1112
"fuzzyMatch",
1213
"graphemes",
1314
"hashString",
1415
"highlight",
16+
"humanizeList",
1517
"isASCII",
1618
"isEmail",
1719
"isUrl",
@@ -32,6 +34,7 @@
3234
"sentenceCase",
3335
"singularize",
3436
"slugify",
37+
"smartSplit",
3538
"snakeCase",
3639
"stripHtml",
3740
"template",

benchmarks/bundle-size-results.md

Lines changed: 21 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -2,54 +2,57 @@
22

33
## Overview
44

5-
- **Total Functions**: 41
6-
- **Nano Wins**: 34/41
7-
- **Average Size Reduction**: -52%
5+
- **Total Functions**: 44
6+
- **Nano Wins**: 37/44
7+
- **Average Size Reduction**: -49%
88

99
## Detailed Comparison
1010

1111
Sizes shown are minified (gzipped). For nano-string-utils, tree-shaken size is shown when different from bundled.
1212

1313
| Function | nano-string-utils | lodash | es-toolkit | Winner | Savings |
1414
| --------------------- | ----------------- | ------------- | ----------- | ---------- | ------- |
15-
| camelCase | 1.6KB (826B) | 8.3KB (3.4KB) | 367B (273B) | es-toolkit | -203% |
16-
| capitalize | 1.4KB (697B) | 3.7KB (1.7KB) | 97B (107B) | es-toolkit | -551% |
15+
| camelCase | 1.6KB (827B) | 8.3KB (3.4KB) | 367B (273B) | es-toolkit | -203% |
16+
| capitalize | 1.4KB (696B) | 3.7KB (1.7KB) | 97B (107B) | es-toolkit | -550% |
1717
| codePoints | 1.4KB (728B) | - | - | nano 🏆 | - |
1818
| constantCase | 1.6KB (805B) | - | - | nano 🏆 | - |
1919
| deburr | 1.6KB (879B) | 4.6KB (1.8KB) | 544B (332B) | es-toolkit | -165% |
2020
| diff | 1.8KB (863B) | - | - | nano 🏆 | - |
2121
| dotCase | 1.6KB (785B) | - | - | nano 🏆 | - |
2222
| escapeHtml | 1.4KB (741B) | - | - | nano 🏆 | - |
23-
| excerpt | 1.6KB (840B) | - | - | nano 🏆 | - |
23+
| excerpt | 1.6KB (841B) | - | - | nano 🏆 | - |
24+
| extractEntities | 2.3KB (1.1KB) | - | - | nano 🏆 | - |
2425
| fuzzyMatch | 2.4KB (1.2KB) | - | - | nano 🏆 | - |
25-
| graphemes | 1.5KB (759B) | - | - | nano 🏆 | - |
26-
| hashString | 1.5KB (761B) | - | - | nano 🏆 | - |
26+
| graphemes | 1.5KB (758B) | - | - | nano 🏆 | - |
27+
| hashString | 1.5KB (760B) | - | - | nano 🏆 | - |
2728
| highlight | 1.9KB (1.0KB) | - | - | nano 🏆 | - |
28-
| isASCII | 1.4KB (721B) | - | - | nano 🏆 | - |
29-
| isEmail | 1.3KB (666B) | - | - | nano 🏆 | - |
30-
| isUrl | 1.3KB (666B) | - | - | nano 🏆 | - |
29+
| humanizeList | 1.6KB (856B) | - | - | nano 🏆 | - |
30+
| isASCII | 1.4KB (720B) | - | - | nano 🏆 | - |
31+
| isEmail | 1.3KB (665B) | - | - | nano 🏆 | - |
32+
| isUrl | 1.3KB (665B) | - | - | nano 🏆 | - |
3133
| kebabCase | 1.6KB (792B) | 6.7KB (2.8KB) | 238B (197B) | es-toolkit | -302% |
3234
| levenshtein | 2.0KB (1.0KB) | - | - | nano 🏆 | - |
3335
| levenshteinNormalized | 2.2KB (1.1KB) | - | - | nano 🏆 | - |
34-
| memoize | 1.8KB (928B) | - | - | nano 🏆 | - |
35-
| normalizeWhitespace | 1.8KB (859B) | - | - | nano 🏆 | - |
36-
| pad | 1.7KB (896B) | 5.8KB (2.6KB) | 109B (118B) | es-toolkit | -659% |
36+
| memoize | 1.8KB (929B) | - | - | nano 🏆 | - |
37+
| normalizeWhitespace | 1.8KB (860B) | - | - | nano 🏆 | - |
38+
| pad | 1.7KB (895B) | 5.8KB (2.6KB) | 109B (118B) | es-toolkit | -658% |
3739
| padEnd | 1.5KB (788B) | 5.7KB (2.5KB) | - | nano 🏆 | 70% |
3840
| padStart | 1.5KB (784B) | 5.7KB (2.5KB) | - | nano 🏆 | 70% |
39-
| pascalCase | 1.6KB (820B) | - | 299B (231B) | es-toolkit | -255% |
41+
| pascalCase | 1.6KB (821B) | - | 299B (231B) | es-toolkit | -255% |
4042
| pathCase | 1.6KB (785B) | - | - | nano 🏆 | - |
4143
| pluralize | 2.2KB (1.0KB) | - | - | nano 🏆 | - |
4244
| randomString | 1.5KB (821B) | - | - | nano 🏆 | - |
43-
| removeNonPrintable | 1.7KB (919B) | - | - | nano 🏆 | - |
44-
| reverse | 1.4KB (688B) | - | - | nano 🏆 | - |
45+
| removeNonPrintable | 1.7KB (918B) | - | - | nano 🏆 | - |
46+
| reverse | 1.4KB (687B) | - | - | nano 🏆 | - |
4547
| sentenceCase | 2.1KB (991B) | - | - | nano 🏆 | - |
4648
| singularize | 2.6KB (1.1KB) | - | - | nano 🏆 | - |
4749
| slugify | 1.3KB (667B) | - | - | nano 🏆 | - |
50+
| smartSplit | 2.3KB (1.1KB) | - | - | nano 🏆 | - |
4851
| snakeCase | 1.6KB (793B) | 6.7KB (2.8KB) | 238B (197B) | es-toolkit | -303% |
4952
| stripHtml | 1.4KB (689B) | - | - | nano 🏆 | - |
5053
| template | 1.7KB (887B) | 13KB (5.7KB) | - | nano 🏆 | 85% |
5154
| templateSafe | 2.1KB (1.0KB) | - | - | nano 🏆 | - |
5255
| titleCase | 2.3KB (1.1KB) | - | - | nano 🏆 | - |
5356
| toASCII | 4.6KB (1.8KB) | - | - | nano 🏆 | - |
54-
| truncate | 1.5KB (780B) | 6.4KB (2.9KB) | - | nano 🏆 | 73% |
57+
| truncate | 1.5KB (781B) | 6.4KB (2.9KB) | - | nano 🏆 | 73% |
5558
| wordCount | 1.4KB (713B) | - | - | nano 🏆 | - |

0 commit comments

Comments
 (0)