Skip to content

Commit 6ecceaa

Browse files
authored
Spec-compliant HTML5 decode (#7)
* Spec-compliant HTML5 decode * Changes for Linux * Update Package-Builder
1 parent 00182c0 commit 6ecceaa

File tree

7 files changed

+1406
-371
lines changed

7 files changed

+1406
-371
lines changed

Package-Builder

README.md

Lines changed: 34 additions & 18 deletions
Original file line numberDiff line numberDiff line change
@@ -6,41 +6,57 @@
66
![Apache 2](https://img.shields.io/badge/license-Apache2-blue.svg?style=flat)
77

88
## Summary
9-
Pure Swift HTML character escape utility tool for Swift 3.0.
9+
Pure Swift HTML encode/decode utility tool for Swift 3.0.
1010

11-
Currently includes support for HTML4 named character references. You can find the list of all 252 HTML4 named character references [here](https://www.w3.org/TR/html4/sgml/entities.html).
11+
Now includes support for HTML5 named character references. You can find the list of all 2231 HTML5 named character references [here](https://www.w3.org/TR/html5/syntax.html#named-character-references).
1212

13-
`HTMLEntities` escapes ALL non-ASCII characters, as well as the characters `<`, `>`, `&`, ``, `` as these five characters are part of the HTML tag and HTML attribute syntaxes.
13+
`HTMLEntities` can escape ALL non-ASCII characters and ASCII non-print character (i.e. NUL, ESC, DEL), as well as the characters `<`, `>`, `&`, `"`, `` as these five characters are part of the HTML tag and HTML attribute syntaxes.
1414

15-
In addition, `HTMLEntities` can unescape encoded HTML text that contains decimal, hexadecimal, or HTML4 named character reference escapes.
15+
In addition, `HTMLEntities` can unescape encoded HTML text that contains decimal, hexadecimal, or HTML5 named character references.
1616

1717
## Features
1818

19-
* Supports HTML4 named character references (`nbsp`, `cent`, etc.)
19+
* Supports HTML5 named character references (`NegativeMediumSpace;` etc.)
20+
* HTML5 spec-compliant; strict parse mode recognizes [parse errors](https://www.w3.org/TR/html5/syntax.html#tokenizing-character-references)
2021
* Supports decimal and hexadecimal escapes for non-named characters
2122
* Simple to use as functions are added by way of extending the default `String` class
2223
* Minimal dependencies; implementation is completely self-contained
2324

24-
## Swift Version
25+
## Version Info
2526

26-
HTMLEntities 1.0 runs on Swift 3.0, on both macOS and Ubuntu Linux.
27+
HTMLEntities 2.0 runs on Swift 3.0, on both macOS and Ubuntu Linux.
2728

2829
## Usage
2930

31+
### Install via Swift Package Manager (SPM)
32+
33+
```swift
34+
import PackageDescription
35+
36+
let package = Package(
37+
name: "package-name",
38+
dependencies: [
39+
.Package(url: "https://github.com/IBM-Swift/swift-html-entities.git", majorVersion: 2, minor: 0)
40+
]
41+
)
42+
```
43+
44+
### In code
45+
3046
```swift
3147
import HTMLEntities
3248

3349
// encode example
3450
let html = "<script>alert(\"abc\")</script>"
3551

3652
print(html.htmlEscape())
37-
// Prints &lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"
53+
// Prints "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"
3854

3955
// decode example
4056
let htmlencoded = "&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;"
4157

4258
print(htmlencoded.htmlUnescape())
43-
// Prints <script>alert(\"abc\")</script>"
59+
// Prints "<script>alert(\"abc\")</script>"
4460
```
4561

4662
## Advanced Options
@@ -56,18 +72,18 @@ Defaults to `false`. Specifies if decimal character escapes should be used inste
5672
```swift
5773
import HTMLEntities
5874

59-
let text = "한, 한, é, é, 🇺🇸"
75+
let text = "한, 한, ế, ế, 🇺🇸"
6076

6177
print(text.htmlEscape())
62-
// Prints &#x1112;&#x1161;&#x11AB;, &#xD55C;, e&#x301;, &eacute;, &#x1F1FA;&#x1F1F8;
78+
// Prints "&#x1112;&#x1161;&#x11AB;, &#xD55C;, &#x1EBF;, e&#x302;&#x301;, &#x1F1FA;&#x1F1F8;"
6379

6480
print(text.htmlEscape(decimal: true))
65-
// Prints &#4370;&#4449;&#4523;, &#54620;, e&#769;, &eacute;, &#127482;&#127480;
81+
// Prints "&#4370;&#4449;&#4523;, &#54620;, &#7871;, e&#770;&#769;, &#127482;&#127480;"
6682
```
6783

6884
#### `useNamedReferences`
6985

70-
Defaults to `true`. Specifies if named character references should be used whenever possible. Set to `false` to always use numeric character escape, i.e., for compatibility with older browsers that do not recognize named character references.
86+
Defaults to `true`. Specifies if named character references should be used whenever possible. Set to `false` to always use numeric character references, i.e., for compatibility with older browsers that do not recognize named character references.
7187

7288
```swift
7389
import HTMLEntities
@@ -77,15 +93,15 @@ let html = "<script>alert(\"abc\")</script>"
7793
print(html.htmlEscape())
7894
// Prints “&lt;script&gt;alert(&quot;abc&quot;)&lt;/script&gt;”
7995

80-
print(html.htmlEscape(userNamedReferences: false))
96+
print(html.htmlEscape(useNamedReferences: false))
8197
// Prints “&#x3C;script&#x3E;alert(&#x22;abc&#x22;)&#x3C;/script&#x3E;”
8298
```
8399

84100
### Unescape Options
85101

86102
#### `strict`
87103

88-
Defaults to `true`. Specifies if HTML numeric character escapes MUST always end with `;`. Some browsers allow numeric character escapes (i.e., decimal and hexadecimal types) to end without `;`. Always ending character escapes with `;` is recommended; however, for compatibility reasons, `HTMLEntities` allows non-strict ending option for situations that require it.
104+
Defaults to `false`. Specifies if HTML5 parse errors should be thrown or simply passed over. **NOTE**: `htmlUnescape()` is a throwing function if `strict` is used in call argument (no matter if it is set to `true` or `false`); `htmlUnescape()` is NOT a throwing function if no argument is provided.
89105

90106
```swift
91107
import HTMLEntities
@@ -95,10 +111,10 @@ let text = "&#4370&#4449&#4523"
95111
print(text.htmlUnescape())
96112
// Prints “&#4370&#4449&#4523”
97113

98-
print(text.htmlUnescape(strict: false))
99-
// Prints “한”
114+
print(try text.htmlUnescape(strict: true))
115+
// Throws a `ParseError.MissingSemicolon` instance
100116
```
101117

102118
## License
103119

104-
Apache 2.0
120+
Apache 2.0

0 commit comments

Comments
 (0)