Skip to content

Commit 8989207

Browse files
Abduqodiri Qurbonzodaqurbonzoda
authored andcommitted
HexFormat proposal
1 parent 310b1f7 commit 8989207

File tree

1 file changed

+320
-0
lines changed

1 file changed

+320
-0
lines changed

proposals/stdlib/hex-format.md

Lines changed: 320 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,320 @@
1+
# HexFormat
2+
3+
* **Type**: Standard Library API proposal
4+
* **Author**: Abduqodiri Qurbonzoda
5+
* **Status**: Implemented in Kotlin 1.9.0
6+
* **Prototype**: Implemented
7+
* **Target issue**: [KT-57762](https://youtrack.jetbrains.com/issue/KT-57762/)
8+
* **Discussion**: TBD
9+
10+
## Summary
11+
12+
Convenient API for formatting binary data into hexadecimal string form and parsing back.
13+
14+
## Motivation
15+
16+
Our research has shown that hexadecimal representation is more widely used than other numeric bases,
17+
second only to decimal representation. There are some fundamental reasons for the hex popularity:
18+
* Hexadecimal representation is more human-readable and understandable when it comes to bits.
19+
Each digit in the hex system represents exactly four bits of data,
20+
making the mapping of a hex digit to its corresponding nibble straightforward.
21+
* Hex representation is more compact than the decimal format and consumes a predictable number of characters.
22+
* The implementation of a hex encoder/decoder is relatively simple and fast.
23+
24+
By providing a convenient API for common use cases described below, we aim to make coding in Kotlin easier and more enjoyable.
25+
26+
## Use cases
27+
28+
### Logging and debugging
29+
30+
The readability of the format makes it very appealing for logging and debugging.
31+
The value that is converted to hex for logging is usually less informative itself than its binary representation,
32+
e.g., when the value has some particular bit pattern. Another popular use case is printing bytes in some
33+
[hex dump](https://en.wikipedia.org/wiki/Hex_dump) format, split into lines and groups.
34+
35+
### Storing or transmitting binary data in text-only formats
36+
37+
Sometimes binary data needs to be embedded into text-only formats such as URL, XML, or JSON.
38+
Our research indicates that in this use case, hex encoding is among the most frequently used encodings,
39+
especially when encoding primitive values such as `Int` and `Long`.
40+
41+
### Protocol requirements
42+
43+
The following popular protocols require hex format:
44+
* When generating or parsing HTML code, one might need to work with the hex representation of RGB color codes.
45+
e.g., `<div style="background-color:#ff6347;">...</div>`
46+
* To express Unicode code points in HTML or XML.
47+
e.g., `<message>It's &#x1F327; outside, be sure to grab &#x2602;</message>`
48+
* The framework used in your project might require specifying IP or MAC addresses in a certain hex format.
49+
e.g., `"00:1b:63:84:45:e6"` or `"001B.6384.45E6"`
50+
51+
## Similar API review
52+
53+
* Java [`HexFormat`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) class.
54+
* Python [binascii](https://docs.python.org/3/library/binascii.html) module. Also,
55+
`hex` and `fromhex` functions on [bytes objects](https://docs.python.org/3/library/stdtypes.html#bytes-objects).
56+
57+
## Proposal
58+
59+
Considering the use cases mentioned above it is proposed to have the following format options.
60+
61+
For formatting a numeric value:
62+
* Whether upper case or lower case hexadecimal digits should be used
63+
* The prefix of the hex representation
64+
* The suffix of the hex representation
65+
* Whether to remove leading zeros in the hex representation
66+
67+
For formatting `ByteArray`:
68+
* Whether upper case or lower case hexadecimal digits should be used
69+
* The number of bytes per line
70+
* The number of bytes per group
71+
* The string used to separate groups in a line
72+
* The string used to separate bytes in a group
73+
* The prefix of a byte hex representation
74+
* The suffix of a byte hex representation
75+
76+
### Creating a format
77+
78+
It is proposed to introduce an immutable `HexFormat` class that holds the options.
79+
`Builder` is used to configure a format. Each option in the builder has a default value that can be customized.
80+
All related types are nested inside `HexFormat` to reduce the top-level surface area of the API:
81+
```
82+
public class HexFormat internal constructor(
83+
val upperCase: Boolean,
84+
val bytes: BytesHexFormat,
85+
val number: NumberHexFormat
86+
) {
87+
88+
public class Builder internal constructor() {
89+
var upperCase: Boolean = false
90+
val bytes: BytesHexFormat.Builder = BytesHexFormat.Builder()
91+
val number: NumberHexFormat.Builder = NumberHexFormat.Builder()
92+
93+
inline fun bytes(builderAction: BytesHexFormat.Builder.() -> Unit)
94+
inline fun number(builderAction: NumberHexFormat.Builder.() -> Unit)
95+
}
96+
97+
public class BytesHexFormat internal constructor(
98+
val bytesPerLine: Int,
99+
val bytesPerGroup: Int,
100+
val groupSeparator: String,
101+
val byteSeparator: String,
102+
val bytePrefix: String,
103+
val byteSuffix: String
104+
) {
105+
106+
public class Builder internal constructor() {
107+
var bytesPerLine: Int = Int.MAX_VALUE
108+
var bytesPerGroup: Int = Int.MAX_VALUE
109+
var groupSeparator: String = " "
110+
var byteSeparator: String = ""
111+
var bytePrefix: String = ""
112+
var byteSuffix: String = ""
113+
}
114+
}
115+
116+
public class NumberHexFormat internal constructor(
117+
val prefix: String,
118+
val suffix: String,
119+
val removeLeadingZeros: Boolean
120+
) {
121+
122+
public class Builder internal constructor() {
123+
var prefix: String = ""
124+
var suffix: String = ""
125+
var removeLeadingZeros: Boolean = false
126+
}
127+
}
128+
}
129+
```
130+
131+
`BytesHexFormat` and `NumberHexFormat` classes hold format options for `ByteArray` and numeric values, correspondingly.
132+
`upperCase` option, which is common to both `ByteArray` and numeric values, is stored in `HexFormat`.
133+
134+
It's not possible to instantiate a `HexFormat` or its builder directly. The following function is provided instead:
135+
```
136+
public inline fun HexFormat(builderAction: HexFormat.Builder.() -> Unit): HexFormat
137+
```
138+
139+
### Formatting
140+
141+
For formatting, the following extension functions are proposed:
142+
```
143+
// Formats the byte array using HexFormat.upperCase and HexFormat.bytes
144+
public fun ByteArray.toHexString(format: HexFormat = HexFormat.Default): String
145+
146+
public fun ByteArray.toHexString(
147+
startIndex: Int = 0,
148+
endIndex: Int = size,
149+
format: HexFormat = HexFormat.Default
150+
): String
151+
152+
// Formats the numeric value using HexFormat.upperCase and HexFormat.number
153+
// N is Byte, Short, Int, Long, and their unsigned counterparts
154+
public fun N.toHexString(format: HexFormat = HexFormat.Default): String
155+
```
156+
157+
### Parsing
158+
159+
It is critical to be able to parse the results of the formatting functions above.
160+
For parsing, the following extension functions are proposed:
161+
```
162+
// Parses a byte array
163+
public fun String.hexToByteArray(format: HexFormat = HexFormat.Default): ByteArray
164+
165+
// Parses a numeric value
166+
// N is Byte, Short, Int, Long, and their unsigned counterparts
167+
public fun String.hexToN(format: HexFormat = HexFormat.Default): String
168+
```
169+
170+
## Contracts
171+
172+
* When formatting a `ByteArray`, the LF character is used to separate lines.
173+
* When parsing a `ByteArray`, any of the char sequences CRLF (`"\r\n"`), LF (`"\n"`) and CR (`"\r"`) are considered a valid line separator.
174+
* Parsing is performed in a case-insensitive manner.
175+
* `NumberHexFormat.removeLeadingZeros` is ignored when parsing.
176+
* Assigning a non-positive value to `BytesHexFormat.Builder.bytesPerLine/bytesPerGroup` is prohibited.
177+
In this case `IllegalArgumentException` is thrown.
178+
* Assigning a string containing LF or CR character to `BytesHexFormat.Builder.byteSeparator/bytePrefix/byteSuffix`
179+
and `NumberHexFormat.Builder.prefix/suffix` is prohibited. In this case `IllegalArgumentException` is thrown.
180+
181+
### Examples
182+
183+
```
184+
// Parsing an Int
185+
"3A".hexToInt() // 58
186+
// Formatting an Int
187+
93.toHexString() // "0000005d"
188+
189+
// Parsing a ByteArray
190+
val macAddress = "001b638445e6".hexToByteArray()
191+
192+
// Formatting a ByteArray
193+
macAddress.toHexString(HexFormat { bytes.byteSeparator = ":" }) // "00:1b:63:84:45:e6"
194+
195+
// Defining a format and assigning it to a variable
196+
val threeGroupFormat = HexFormat { upperCase = true; bytes.bytesPerGroup = 2; bytes.groupSeparator = "." }
197+
// Formatting a ByteArray using a previously defined format
198+
macAddress.toHexString(threeGroupFormat) // "001B.6384.45E6"
199+
```
200+
201+
## Alternatives
202+
203+
### For numeric values
204+
205+
The Kotlin standard library provides `Primitive.toString(radix = 16)` for converting primitive values
206+
to their hex representation. However, this function focuses on converting the values, not bits. As a result:
207+
* Negative values are formatted with minus sign.
208+
One needs to convert values of signed types to corresponding unsigned types before converting to hex representation.
209+
* Leading zero nibbles are ignored. To get the full length one must additionally `padStart` the result with `'0'`.
210+
* Related complaint: [KT-60782](https://youtrack.jetbrains.com/issue/KT-60782)
211+
212+
There is also `String.toPrimitive(radix = 16)` for parsing back a primitive value.
213+
But this function throws if the primitive type can't have the resulting value, even if the bits fit.
214+
e.g., `"FF".toByte()` fails. To prevent this, the string must first be converted to the corresponding unsigned type.
215+
216+
### For `ByteArray`
217+
218+
`ByteArray.joinToString(separator) { byte -> byte.toString(radix = 16) }` can be used to format a ByteArray.
219+
Downsides are:
220+
* Not possible to separate bytes into groups and lines
221+
* Challenges with formatting `Byte` to hex described above
222+
223+
There is no API for parsing `ByteArray` currently.
224+
225+
## Naming
226+
227+
### Existing functions for converting to String
228+
229+
For ByteArray:
230+
* `contentToString`
231+
* `encodeToByteArray`/`decodeToString`
232+
* `joinToString`
233+
234+
For primitive types:
235+
* `toString(radix)`
236+
* `Char.digitToInt()`
237+
* `Int.digitToInt()`
238+
239+
### Naming options
240+
241+
As listed above, existing functions with similar purpose use `toString` suffix when converting to `String`,
242+
and `toType` when converting from `String` to another type. Thus, options with similar naming schemes were considered:
243+
* **Proposed:** `toHexString` and `hexToType` for formatting and parsing, correspondingly
244+
* "hex" used as an adjective
245+
* `hexToString` or `hexifyToString` for formatting
246+
* "hex" used as a verb
247+
* A similar verb is needed to describe the parsing of a hex-formatted string
248+
* Use `format` and `parse` verbs, e.g., `formatToHexString` and `parseHexToByteArray`
249+
* `To` already indicates that the function converts the receiver
250+
251+
## API design approaches
252+
253+
* **Proposed:** Provide formatting and parsing functions as extensions on the type to be converted
254+
* Pro: Discoverable
255+
* Users already know and use the `toString` family of extension functions.
256+
When typing "toString", code completion displays the hex conversion functions as well.
257+
This can also prompt users to wonder how `toString(radix = 16)` differs from `toHexString()`,
258+
and help to choose the proper one.
259+
* Typing ".hex" is enough for code completion to display the hex conversion function for the receiver.
260+
No need to remember the exact function name.
261+
* Pro: Allows chaining with other calls
262+
* Con: May pollute code completion for `String` receiver
263+
* Provide all formatting and parsing functions on `HexFormat`, similar to Java `HexFormat` and Kotlin `Base64`
264+
* Pro: Gathers all related functions under a single type
265+
* Con: Less discoverable than the proposed approach. Users need to remember that there is `HexFormat` class.
266+
* Con: Requires `let` or `run` [scope function](https://kotlinlang.org/docs/scope-functions.html) for chaining with other calls
267+
* Have `BytesHexFormat` and `NumberHexFormat` as top-level classes, each with its own `upperCase` property.
268+
No need for `HexFormat` class. Functions for formatting/parsing `ByteArray` take `BytesHexFormat`,
269+
while functions for numeric types take `NumberHexFormat`. e.g.,
270+
```
271+
byteArray.toHexString(
272+
BytesHexFormat { byteSeparator = " "; bytesPerLine = 16 }
273+
)
274+
```
275+
* Pro: Eliminates possible confusion about what options affect formatting
276+
* Con: Two variables are needed to store preferred format options
277+
* `Builder` overrides a provided format,
278+
e.g., `HexFormat(MY_HEX_FORMAT) { bytes.bytesPerLine = ":" }`
279+
* Not so many use cases for altering an existing format
280+
* Can be added as an overload of `fun HexFormat()`
281+
* Pass options to formatting and parsing functions directly, without introducing `HexFormat`
282+
* Not convenient in cases when a format is defined once and used in multiple occasions
283+
* Adding new options in the future is problematic
284+
* There is no way in Kotlin to require calling a function with named arguments.
285+
Passing multiple arguments without specifying names damages code readability,
286+
e.g., `bitMask.toHexString(true, "0x", false)`
287+
288+
## Dependencies
289+
290+
Only a subset of Kotlin Standard Library available on all supported platforms is required.
291+
292+
## Placement
293+
294+
* Standard Library
295+
* `kotlin.text` package
296+
297+
## Reference implementation
298+
299+
* HexFormat class: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexFormat.kt
300+
* Extensions for formatting and parsing: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexExtensions.kt
301+
* Test cases for formatting and parsing `ByteArray`: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/BytesHexFormatTest.kt
302+
* Test cases for formatting and parsing numeric values: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/NumberHexFormatTest.kt
303+
304+
## Future advancements
305+
306+
* Adding the ability to limit the number of hex digits when formatting numeric values
307+
* `NumberHexFormat.maxLength` could be introduced
308+
* When formatting an `Int`, combination of `maxLength = 6` and `removeLeadingZeros = false` results to exactly 6 least significant hex digits
309+
* Combination of `maxLength = 6` and `removeLeadingZeros = true` returns at most 6 hex (least-significant) digits without leading zeros
310+
* Related request: [KT-60787](https://youtrack.jetbrains.com/issue/KT-60787)
311+
* Overloads for parsing a substring: [KT-58277](https://youtrack.jetbrains.com/issue/KT-58277)
312+
* Overloads for appending format result to an `Appendable`
313+
* `toHexString` might need to be renamed to `hexToString/Appendable` or `hexifyToString/Appendable`, because
314+
`Int.toHexString(stringBuilder)` isn't intuitive to infer that the result is appended to the provided `StringBuilder`
315+
* Formatting and parsing I/O streams in Kotlin/JVM
316+
* Similar to [`InputStream.decodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-input-stream/decoding-with.html)
317+
and [`OutputStream.encodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-output-stream/encoding-with.html)
318+
* Formatting and parsing a `Char`
319+
* Although `Char` is not a numeric type, it has a `Char.code` associated with it.
320+
With the proposed API formatting a `Char` won't be an easy task: `Char.code.toShort().toHexString()`

0 commit comments

Comments
 (0)