|
| 1 | +# HexFormat |
| 2 | + |
| 3 | +* **Type**: Standard Library API proposal |
| 4 | +* **Author**: Abduqodiri Qurbonzoda |
| 5 | +* **Status**: Implemented in Kotlin 1.9.0 |
| 6 | +* **Prototype**: Implemented |
| 7 | +* **Target issue**: [KT-57762](https://youtrack.jetbrains.com/issue/KT-57762/) |
| 8 | +* **Discussion**: TBD |
| 9 | + |
| 10 | +## Summary |
| 11 | + |
| 12 | +Convenient API for formatting binary data into hexadecimal string form and parsing back. |
| 13 | + |
| 14 | +## Motivation |
| 15 | + |
| 16 | +Our research has shown that hexadecimal representation is more widely used than other numeric bases, |
| 17 | +second only to decimal representation. There are some fundamental reasons for the hex popularity: |
| 18 | +* Hexadecimal representation is more human-readable and understandable when it comes to bits. |
| 19 | + Each digit in the hex system represents exactly four bits of data, |
| 20 | + making the mapping of a hex digit to its corresponding nibble straightforward. |
| 21 | +* Hex representation is more compact than the decimal format and consumes a predictable number of characters. |
| 22 | +* The implementation of a hex encoder/decoder is relatively simple and fast. |
| 23 | + |
| 24 | +By providing a convenient API for common use cases described below, we aim to make coding in Kotlin easier and more enjoyable. |
| 25 | + |
| 26 | +## Use cases |
| 27 | + |
| 28 | +### Logging and debugging |
| 29 | + |
| 30 | +The readability of the format makes it very appealing for logging and debugging. |
| 31 | +The value that is converted to hex for logging is usually less informative itself than its binary representation, |
| 32 | +e.g., when the value has some particular bit pattern. Another popular use case is printing bytes in some |
| 33 | +[hex dump](https://en.wikipedia.org/wiki/Hex_dump) format, split into lines and groups. |
| 34 | + |
| 35 | +### Storing or transmitting binary data in text-only formats |
| 36 | + |
| 37 | +Sometimes binary data needs to be embedded into text-only formats such as URL, XML, or JSON. |
| 38 | +Our research indicates that in this use case, hex encoding is among the most frequently used encodings, |
| 39 | +especially when encoding primitive values such as `Int` and `Long`. |
| 40 | + |
| 41 | +### Protocol requirements |
| 42 | + |
| 43 | +The following popular protocols require hex format: |
| 44 | +* When generating or parsing HTML code, one might need to work with the hex representation of RGB color codes. |
| 45 | + e.g., `<div style="background-color:#ff6347;">...</div>` |
| 46 | +* To express Unicode code points in HTML or XML. |
| 47 | + e.g., `<message>It's 🌧 outside, be sure to grab ☂</message>` |
| 48 | +* The framework used in your project might require specifying IP or MAC addresses in a certain hex format. |
| 49 | + e.g., `"00:1b:63:84:45:e6"` or `"001B.6384.45E6"` |
| 50 | + |
| 51 | +## Similar API review |
| 52 | + |
| 53 | +* Java [`HexFormat`](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/util/HexFormat.html) class. |
| 54 | +* Python [binascii](https://docs.python.org/3/library/binascii.html) module. Also, |
| 55 | + `hex` and `fromhex` functions on [bytes objects](https://docs.python.org/3/library/stdtypes.html#bytes-objects). |
| 56 | + |
| 57 | +## Proposal |
| 58 | + |
| 59 | +Considering the use cases mentioned above it is proposed to have the following format options. |
| 60 | + |
| 61 | +For formatting a numeric value: |
| 62 | +* Whether upper case or lower case hexadecimal digits should be used |
| 63 | +* The prefix of the hex representation |
| 64 | +* The suffix of the hex representation |
| 65 | +* Whether to remove leading zeros in the hex representation |
| 66 | + |
| 67 | +For formatting `ByteArray`: |
| 68 | +* Whether upper case or lower case hexadecimal digits should be used |
| 69 | +* The number of bytes per line |
| 70 | +* The number of bytes per group |
| 71 | +* The string used to separate groups in a line |
| 72 | +* The string used to separate bytes in a group |
| 73 | +* The prefix of a byte hex representation |
| 74 | +* The suffix of a byte hex representation |
| 75 | + |
| 76 | +### Creating a format |
| 77 | + |
| 78 | +It is proposed to introduce an immutable `HexFormat` class that holds the options. |
| 79 | +`Builder` is used to configure a format. Each option in the builder has a default value that can be customized. |
| 80 | +All related types are nested inside `HexFormat` to reduce the top-level surface area of the API: |
| 81 | +``` |
| 82 | +public class HexFormat internal constructor( |
| 83 | + val upperCase: Boolean, |
| 84 | + val bytes: BytesHexFormat, |
| 85 | + val number: NumberHexFormat |
| 86 | +) { |
| 87 | +
|
| 88 | + public class Builder internal constructor() { |
| 89 | + var upperCase: Boolean = false |
| 90 | + val bytes: BytesHexFormat.Builder = BytesHexFormat.Builder() |
| 91 | + val number: NumberHexFormat.Builder = NumberHexFormat.Builder() |
| 92 | + |
| 93 | + inline fun bytes(builderAction: BytesHexFormat.Builder.() -> Unit) |
| 94 | + inline fun number(builderAction: NumberHexFormat.Builder.() -> Unit) |
| 95 | + } |
| 96 | + |
| 97 | + public class BytesHexFormat internal constructor( |
| 98 | + val bytesPerLine: Int, |
| 99 | + val bytesPerGroup: Int, |
| 100 | + val groupSeparator: String, |
| 101 | + val byteSeparator: String, |
| 102 | + val bytePrefix: String, |
| 103 | + val byteSuffix: String |
| 104 | + ) { |
| 105 | +
|
| 106 | + public class Builder internal constructor() { |
| 107 | + var bytesPerLine: Int = Int.MAX_VALUE |
| 108 | + var bytesPerGroup: Int = Int.MAX_VALUE |
| 109 | + var groupSeparator: String = " " |
| 110 | + var byteSeparator: String = "" |
| 111 | + var bytePrefix: String = "" |
| 112 | + var byteSuffix: String = "" |
| 113 | + } |
| 114 | + } |
| 115 | +
|
| 116 | + public class NumberHexFormat internal constructor( |
| 117 | + val prefix: String, |
| 118 | + val suffix: String, |
| 119 | + val removeLeadingZeros: Boolean |
| 120 | + ) { |
| 121 | +
|
| 122 | + public class Builder internal constructor() { |
| 123 | + var prefix: String = "" |
| 124 | + var suffix: String = "" |
| 125 | + var removeLeadingZeros: Boolean = false |
| 126 | + } |
| 127 | + } |
| 128 | +} |
| 129 | +``` |
| 130 | + |
| 131 | +`BytesHexFormat` and `NumberHexFormat` classes hold format options for `ByteArray` and numeric values, correspondingly. |
| 132 | +`upperCase` option, which is common to both `ByteArray` and numeric values, is stored in `HexFormat`. |
| 133 | + |
| 134 | +It's not possible to instantiate a `HexFormat` or its builder directly. The following function is provided instead: |
| 135 | +``` |
| 136 | +public inline fun HexFormat(builderAction: HexFormat.Builder.() -> Unit): HexFormat |
| 137 | +``` |
| 138 | + |
| 139 | +### Formatting |
| 140 | + |
| 141 | +For formatting, the following extension functions are proposed: |
| 142 | +``` |
| 143 | +// Formats the byte array using HexFormat.upperCase and HexFormat.bytes |
| 144 | +public fun ByteArray.toHexString(format: HexFormat = HexFormat.Default): String |
| 145 | +
|
| 146 | +public fun ByteArray.toHexString( |
| 147 | + startIndex: Int = 0, |
| 148 | + endIndex: Int = size, |
| 149 | + format: HexFormat = HexFormat.Default |
| 150 | +): String |
| 151 | +
|
| 152 | +// Formats the numeric value using HexFormat.upperCase and HexFormat.number |
| 153 | +// N is Byte, Short, Int, Long, and their unsigned counterparts |
| 154 | +public fun N.toHexString(format: HexFormat = HexFormat.Default): String |
| 155 | +``` |
| 156 | + |
| 157 | +### Parsing |
| 158 | + |
| 159 | +It is critical to be able to parse the results of the formatting functions above. |
| 160 | +For parsing, the following extension functions are proposed: |
| 161 | +``` |
| 162 | +// Parses a byte array |
| 163 | +public fun String.hexToByteArray(format: HexFormat = HexFormat.Default): ByteArray |
| 164 | +
|
| 165 | +// Parses a numeric value |
| 166 | +// N is Byte, Short, Int, Long, and their unsigned counterparts |
| 167 | +public fun String.hexToN(format: HexFormat = HexFormat.Default): String |
| 168 | +``` |
| 169 | + |
| 170 | +## Contracts |
| 171 | + |
| 172 | +* When formatting a `ByteArray`, the LF character is used to separate lines. |
| 173 | +* When parsing a `ByteArray`, any of the char sequences CRLF (`"\r\n"`), LF (`"\n"`) and CR (`"\r"`) are considered a valid line separator. |
| 174 | +* Parsing is performed in a case-insensitive manner. |
| 175 | +* `NumberHexFormat.removeLeadingZeros` is ignored when parsing. |
| 176 | +* Assigning a non-positive value to `BytesHexFormat.Builder.bytesPerLine/bytesPerGroup` is prohibited. |
| 177 | + In this case `IllegalArgumentException` is thrown. |
| 178 | +* Assigning a string containing LF or CR character to `BytesHexFormat.Builder.byteSeparator/bytePrefix/byteSuffix` |
| 179 | + and `NumberHexFormat.Builder.prefix/suffix` is prohibited. In this case `IllegalArgumentException` is thrown. |
| 180 | + |
| 181 | +### Examples |
| 182 | + |
| 183 | +``` |
| 184 | +// Parsing an Int |
| 185 | +"3A".hexToInt() // 58 |
| 186 | +// Formatting an Int |
| 187 | +93.toHexString() // "0000005d" |
| 188 | +
|
| 189 | +// Parsing a ByteArray |
| 190 | +val macAddress = "001b638445e6".hexToByteArray() |
| 191 | +
|
| 192 | +// Formatting a ByteArray |
| 193 | +macAddress.toHexString(HexFormat { bytes.byteSeparator = ":" }) // "00:1b:63:84:45:e6" |
| 194 | +
|
| 195 | +// Defining a format and assigning it to a variable |
| 196 | +val threeGroupFormat = HexFormat { upperCase = true; bytes.bytesPerGroup = 2; bytes.groupSeparator = "." } |
| 197 | +// Formatting a ByteArray using a previously defined format |
| 198 | +macAddress.toHexString(threeGroupFormat) // "001B.6384.45E6" |
| 199 | +``` |
| 200 | + |
| 201 | +## Alternatives |
| 202 | + |
| 203 | +### For numeric values |
| 204 | + |
| 205 | +The Kotlin standard library provides `Primitive.toString(radix = 16)` for converting primitive values |
| 206 | +to their hex representation. However, this function focuses on converting the values, not bits. As a result: |
| 207 | +* Negative values are formatted with minus sign. |
| 208 | + One needs to convert values of signed types to corresponding unsigned types before converting to hex representation. |
| 209 | +* Leading zero nibbles are ignored. To get the full length one must additionally `padStart` the result with `'0'`. |
| 210 | +* Related complaint: [KT-60782](https://youtrack.jetbrains.com/issue/KT-60782) |
| 211 | + |
| 212 | +There is also `String.toPrimitive(radix = 16)` for parsing back a primitive value. |
| 213 | +But this function throws if the primitive type can't have the resulting value, even if the bits fit. |
| 214 | +e.g., `"FF".toByte()` fails. To prevent this, the string must first be converted to the corresponding unsigned type. |
| 215 | + |
| 216 | +### For `ByteArray` |
| 217 | + |
| 218 | +`ByteArray.joinToString(separator) { byte -> byte.toString(radix = 16) }` can be used to format a ByteArray. |
| 219 | +Downsides are: |
| 220 | +* Not possible to separate bytes into groups and lines |
| 221 | +* Challenges with formatting `Byte` to hex described above |
| 222 | + |
| 223 | +There is no API for parsing `ByteArray` currently. |
| 224 | + |
| 225 | +## Naming |
| 226 | + |
| 227 | +### Existing functions for converting to String |
| 228 | + |
| 229 | +For ByteArray: |
| 230 | +* `contentToString` |
| 231 | +* `encodeToByteArray`/`decodeToString` |
| 232 | +* `joinToString` |
| 233 | + |
| 234 | +For primitive types: |
| 235 | +* `toString(radix)` |
| 236 | +* `Char.digitToInt()` |
| 237 | +* `Int.digitToInt()` |
| 238 | + |
| 239 | +### Naming options |
| 240 | + |
| 241 | +As listed above, existing functions with similar purpose use `toString` suffix when converting to `String`, |
| 242 | +and `toType` when converting from `String` to another type. Thus, options with similar naming schemes were considered: |
| 243 | +* **Proposed:** `toHexString` and `hexToType` for formatting and parsing, correspondingly |
| 244 | + * "hex" used as an adjective |
| 245 | +* `hexToString` or `hexifyToString` for formatting |
| 246 | + * "hex" used as a verb |
| 247 | + * A similar verb is needed to describe the parsing of a hex-formatted string |
| 248 | +* Use `format` and `parse` verbs, e.g., `formatToHexString` and `parseHexToByteArray` |
| 249 | + * `To` already indicates that the function converts the receiver |
| 250 | + |
| 251 | +## API design approaches |
| 252 | + |
| 253 | +* **Proposed:** Provide formatting and parsing functions as extensions on the type to be converted |
| 254 | + * Pro: Discoverable |
| 255 | + * Users already know and use the `toString` family of extension functions. |
| 256 | + When typing "toString", code completion displays the hex conversion functions as well. |
| 257 | + This can also prompt users to wonder how `toString(radix = 16)` differs from `toHexString()`, |
| 258 | + and help to choose the proper one. |
| 259 | + * Typing ".hex" is enough for code completion to display the hex conversion function for the receiver. |
| 260 | + No need to remember the exact function name. |
| 261 | + * Pro: Allows chaining with other calls |
| 262 | + * Con: May pollute code completion for `String` receiver |
| 263 | +* Provide all formatting and parsing functions on `HexFormat`, similar to Java `HexFormat` and Kotlin `Base64` |
| 264 | + * Pro: Gathers all related functions under a single type |
| 265 | + * Con: Less discoverable than the proposed approach. Users need to remember that there is `HexFormat` class. |
| 266 | + * Con: Requires `let` or `run` [scope function](https://kotlinlang.org/docs/scope-functions.html) for chaining with other calls |
| 267 | +* Have `BytesHexFormat` and `NumberHexFormat` as top-level classes, each with its own `upperCase` property. |
| 268 | + No need for `HexFormat` class. Functions for formatting/parsing `ByteArray` take `BytesHexFormat`, |
| 269 | + while functions for numeric types take `NumberHexFormat`. e.g., |
| 270 | + ``` |
| 271 | + byteArray.toHexString( |
| 272 | + BytesHexFormat { byteSeparator = " "; bytesPerLine = 16 } |
| 273 | + ) |
| 274 | + ``` |
| 275 | + * Pro: Eliminates possible confusion about what options affect formatting |
| 276 | + * Con: Two variables are needed to store preferred format options |
| 277 | +* `Builder` overrides a provided format, |
| 278 | + e.g., `HexFormat(MY_HEX_FORMAT) { bytes.bytesPerLine = ":" }` |
| 279 | + * Not so many use cases for altering an existing format |
| 280 | + * Can be added as an overload of `fun HexFormat()` |
| 281 | +* Pass options to formatting and parsing functions directly, without introducing `HexFormat` |
| 282 | + * Not convenient in cases when a format is defined once and used in multiple occasions |
| 283 | + * Adding new options in the future is problematic |
| 284 | + * There is no way in Kotlin to require calling a function with named arguments. |
| 285 | + Passing multiple arguments without specifying names damages code readability, |
| 286 | + e.g., `bitMask.toHexString(true, "0x", false)` |
| 287 | + |
| 288 | +## Dependencies |
| 289 | + |
| 290 | +Only a subset of Kotlin Standard Library available on all supported platforms is required. |
| 291 | + |
| 292 | +## Placement |
| 293 | + |
| 294 | +* Standard Library |
| 295 | +* `kotlin.text` package |
| 296 | + |
| 297 | +## Reference implementation |
| 298 | + |
| 299 | +* HexFormat class: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexFormat.kt |
| 300 | +* Extensions for formatting and parsing: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/src/kotlin/text/HexExtensions.kt |
| 301 | +* Test cases for formatting and parsing `ByteArray`: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/BytesHexFormatTest.kt |
| 302 | +* Test cases for formatting and parsing numeric values: https://github.com/JetBrains/kotlin/blob/master/libraries/stdlib/test/text/NumberHexFormatTest.kt |
| 303 | + |
| 304 | +## Future advancements |
| 305 | + |
| 306 | +* Adding the ability to limit the number of hex digits when formatting numeric values |
| 307 | + * `NumberHexFormat.maxLength` could be introduced |
| 308 | + * When formatting an `Int`, combination of `maxLength = 6` and `removeLeadingZeros = false` results to exactly 6 least significant hex digits |
| 309 | + * Combination of `maxLength = 6` and `removeLeadingZeros = true` returns at most 6 hex (least-significant) digits without leading zeros |
| 310 | + * Related request: [KT-60787](https://youtrack.jetbrains.com/issue/KT-60787) |
| 311 | +* Overloads for parsing a substring: [KT-58277](https://youtrack.jetbrains.com/issue/KT-58277) |
| 312 | +* Overloads for appending format result to an `Appendable` |
| 313 | + * `toHexString` might need to be renamed to `hexToString/Appendable` or `hexifyToString/Appendable`, because |
| 314 | + `Int.toHexString(stringBuilder)` isn't intuitive to infer that the result is appended to the provided `StringBuilder` |
| 315 | +* Formatting and parsing I/O streams in Kotlin/JVM |
| 316 | + * Similar to [`InputStream.decodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-input-stream/decoding-with.html) |
| 317 | + and [`OutputStream.encodingWith(Base64)`](https://kotlinlang.org/api/latest/jvm/stdlib/kotlin.io.encoding/java.io.-output-stream/encoding-with.html) |
| 318 | +* Formatting and parsing a `Char` |
| 319 | + * Although `Char` is not a numeric type, it has a `Char.code` associated with it. |
| 320 | + With the proposed API formatting a `Char` won't be an easy task: `Char.code.toShort().toHexString()` |
0 commit comments