Hashing over serialization: MurMur, xxHash, Base64, MD5 

In branch [`murmur_hash`](https://github.com/Kotlin/kotlinx.serialization/blob/a2277b377f2b57ffeb96da559222acb53d3918a4/runtime/jvm/src/test/kotlin/kotlinx/serialization/hashing/MurMurTest.kt#L17) I've introduced a prototype which leverages serialization infrastructure to implement Guava-like 128-bit MurMur3 hash.
The main idea is to provide `KOutput` which hashes given object instead of writing it somewhere.

The idea looks promising for the following reasons:
1) Once written, `Hasher` can be used from JVM, JS and native
2) `KOutput` desing allows using `Hasher` for hashing both standalone objects and streams of objects
4) End users should not write adapters for their classes to be hashable. Moreover, using hasher doesn't introduce any additional classes or generated code.

For example, in Guava for class `data class Person(id: Int, name: String)` to be hashable user should provide `Funnel` implementation which deconstructs `Person` into primitive fields.
So one should write something like `Hashing.murmur3_128().newHasher().putObject(person, Funnel<Person> { from, sink -> sink.putInt(id).putString(name)}).makeHash().asLong()`. 
Usually this funnel should be cached in a class field somewhere, bytecode size and methods count become larger, and things get even worse when class contains non-primitive fields.

The current design allows to write `MurMur3_128Hasher().longHash(person)` as long as person is serializable.

Performance looks even more promising: we don't have to allocate intermediate object and `KOutput` can be easily reused.

Benchmark results, compared with Guava simple data class:

```
HashingBenchmark.guavaHash       thrpt    5   8.982 ± 1.156  ops/us
HashingBenchmark.kxHash          thrpt    5  18.766 ± 4.031  ops/us
HashingBenchmark.kxHashReusable  thrpt    5  20.549 ± 0.636  ops/us
```

Now we should understand whether community and `kotlinx.serialization` users are interested in this functionality and react accordingly.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Hashing over serialization: MurMur, xxHash, Base64, MD5 #163

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Hashing over serialization: MurMur, xxHash, Base64, MD5 #163

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions