Skip to content

Commit d691f1d

Browse files
committed
Update documentation
1 parent f02c48e commit d691f1d

File tree

1 file changed

+15
-1
lines changed

1 file changed

+15
-1
lines changed

docs/EnglishTokenizer.md

Lines changed: 15 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -32,4 +32,18 @@ Output:
3232
[('Emory', (0, 5)), ('NLP', (6, 9)), ('is', (10, 12)), ('a', (13, 14)), ('research', (15, 23)), ('lab', (24, 27)), ('in', (28, 30)), ('Atlanta', (31, 38)), (',', (38, 39)), ('GA', (40, 42)), ('.', (42, 43))]
3333
[('It', (44, 46)), ('is', (47, 49)), ('founded', (50, 57)), ('by', (58, 60)), ('Jinho', (61, 66)), ('D.', (67, 69)), ('Choi', (70, 74)), ('in', (75, 77)), ('2014', (78, 82)), ('.', (82, 83))]
3434
[('Dr.', (84, 87)), ('Choi', (88, 92)), ('is', (93, 95)), ('a', (96, 97)), ('professor', (98, 107)), ('at', (108, 110)), ('Emory', (111, 116)), ('University', (117, 127)), ('.', (127, 128))]
35-
```
35+
```
36+
37+
## Key Features
38+
39+
| Feature | Input Text | Tokens |
40+
|---------|------------|--------|
41+
| Email addresses | `Email (support@elit.cloud)` | [`Email`, `(`, `support@elit.cloud`, `)`] |
42+
| Hyperlinks | `URL: https://elit.cloud` | [`URL`, `:`, `https://elit.cloud`] |
43+
| Emoticons | `I love ELIT :-)!?.` | [`I`, `love`, `ELIT`, `:-)`, `!?.`] |
44+
| Hashtags | `ELIT is the #1 platform #elit2018.` | [`ELIT`, `is`, `the`, `#`, `1`, `platform`, `#elit2018`, `.`] |
45+
| HTML entities | `A←B` | [`A`, `←`, `B`] |
46+
| Hyphens | `(123) 456-7890, 123-456-7890, 2014-2018` | [`(123)`, `456-7890`, `,`, `123-456-7890`, `,`, `2014`, `-`, `2018`] |
47+
| List items | `(A)First (A.1)Second [2a]Third [Forth]` | [`(A)`, `First`, `(A.1)`, `Second`, `[2a]`, `Third`, `[`, `Forth`, `]`] |
48+
| Units | `$1,000 20mg 100cm 11:00a.m. 10:30PM` | [`$`, `1,000`, `20`, `mg`, `100`, `cm`, `11:00`, `a.m.`, `10:30`, `PM`] |
49+
| Acronyms | `I'm gonna miss Dr. Choi 'cause he isn't here.` | [`I`, `'m`, `gon`, `na`, `miss`, `Dr.`, `Choi`, `'cause`, `he`, `is`, `n't`, `here`, `.`] |

0 commit comments

Comments
 (0)