Skip to content

[Feature] I need solutions for re-unicode Chinese character and adjust floating point precision #2922

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
SMFDrummer opened this issue Feb 6, 2025 · 5 comments
Labels

Comments

@SMFDrummer
Copy link

What happened?

My original data is like

{
    "sd": {
        "n": "\u81ea\u4fe1\u7684\u5927\u5634\u82b1",
        "lmz": 1.000000,
        "rsd": {
            "wr": 0.300000,
            "lr": 0.300000,
            "bc": 0
        }
    }
}

When I use parseToJsonElement and encodeToString, this json string changed like

{
    "sd": {
        "n": "自信的大嘴花",
        "lmz": 1.0,
        "rsd": {
            "wr": 0.3,
            "lr": 0.3,
            "bc": 0
        }
    }
}

I'd like to

The reason why I not make this json to a @serializable data class is because this json is so mass, thousands of lines. I literally cannot handle this. I have to use JsonElement as JsonObject instead. So how to keep the unicode non-convert, and how to make the float value maintain 6 digit precision. Please give me some tips.

@SMFDrummer
Copy link
Author

I've tested for KSerializer

import kotlinx.serialization.KSerializer
import kotlinx.serialization.Serializable
import kotlinx.serialization.descriptors.PrimitiveKind
import kotlinx.serialization.descriptors.PrimitiveSerialDescriptor
import kotlinx.serialization.descriptors.SerialDescriptor
import kotlinx.serialization.encodeToString
import kotlinx.serialization.encoding.Decoder
import kotlinx.serialization.encoding.Encoder
import kotlinx.serialization.json.Json
import org.jetbrains.annotations.TestOnly

@Serializable
data class Data(
    val rs: Int,
    @Serializable(with = UnicodeStringSerializer::class)
    val n: String
)

object UnicodeStringSerializer : KSerializer<String> {
    override val descriptor: SerialDescriptor = PrimitiveSerialDescriptor("UnicodeString", PrimitiveKind.STRING)
    override fun deserialize(decoder: Decoder): String {
        val decoded = decoder.decodeString()
        return decoded.split("\\u")
            .filter { it.isNotEmpty() }
            .joinToString("") {
                it.toInt(16).toChar().toString()
            }
    }

    override fun serialize(encoder: Encoder, value: String) {
        val unicodeEscaped = value.toCharArray().joinToString("") {
            "\\u" + it.code.toString(16).padStart(4, '0')
        }
        encoder.encodeString(unicodeEscaped)
    }
}

@TestOnly
fun main() {
    val data = Data(rs = 1703239116, n = "自信的大嘴花")
    val json = Json { encodeDefaults = true }
    val jsonString = json.encodeToString(data)
    println(jsonString)
}

then I get result like

{"rs":1703239116,"n":"\\u81ea\\u4fe1\\u7684\\u5927\\u5634\\u82b1"}

but I want is

{"rs":1703239116,"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1"}

is kotlin cannot handle this? Please give me some advice.

@sandwwraith
Copy link
Member

We do not have any setting to control output formatting, unfortunately. You can try to use JsonUnquotedLiteral together with some JsonTransformer: https://github.com/Kotlin/kotlinx.serialization/blob/master/docs/json.md#encoding-literal-json-content-experimental

@SMFDrummer
Copy link
Author

DAMN Thank YOU AAAAAAAAAAAAA
I used a third-party lib: @nomisRev: kotlinx-serialization-jsonpath, drived by Arrow. This library is so sick.(Apology for at)

fun String.addQuotes(): String = "\"$this\""
fun String.ensureAscii(): String = this.toCharArray().joinToString("") { "\\u" + it.code.toString(16).padStart(4, '0') }

val origin = """{"sd":{"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1","lmz":1.000000,"rsd":{"wr":0.300000,"lr":0.300000,"bc":0}}}"""

val data = Json.parse(origin) as JsonObject // {"sd":{"n":"自信的大嘴花","lmz":1.0,"rsd":{"wr":0.3,"lr":0.3,"bc":0}}}
// I can modify others here, and do something else..., and then ->
val data1 = JsonPath.path("sd.n").modify(data) {
    JsonUnquotedLiteral(it.jsonPrimitive.content.ensureAscii().escaped().addQuotes())
} // {"sd":{"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1","lmz":1.0,"rsd":{"wr":0.3,"lr":0.3,"bc":0}}}

val data2 = JsonPath.path("sd.lmz").modify(data1) {
    JsonUnquotedLiteral(it.jsonPrimitive.content.toBigDecimal().setScale(6).toString())
} // {"sd":{"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1","lmz":1.000000,"rsd":{"wr":0.3,"lr":0.3,"bc":0}}}

val data3 = JsonPath.path("sd.rsd.wr").modify(data2) {
    JsonUnquotedLiteral(it.jsonPrimitive.content.toBigDecimal().setScale(6).toString())
} // {"sd":{"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1","lmz":1.000000,"rsd":{"wr":0.300000,"lr":0.3,"bc":0}}}

val data4 = JsonPath.path("sd.rsd.lr").modify(data3) {
    JsonUnquotedLiteral(it.jsonPrimitive.content.toBigDecimal().setScale(6).toString())
} // {"sd":{"n":"\u81ea\u4fe1\u7684\u5927\u5634\u82b1","lmz":1.000000,"rsd":{"wr":0.300000,"lr":0.300000,"bc":0}}}

println(Json.encodeToString(data4) == origin) // true

Although this issue has been resolved, as I am not very familiar with the Arrow library and am new to using the JsonPath library, I do not know how to combine them together to avoid declaring so many intermediate variables. I apologize for taking up your time. As a beginner, I would like to ask you how to reduce intermediate variables?(My code is not very standardized, please forgive me)

@sandwwraith
Copy link
Member

Unfortunately, I'm not familiar with JsonPath or Arrow so I can't help you here

@SMFDrummer
Copy link
Author

Unfortunately, I'm not familiar with JsonPath or Arrow so I can't help you here

Hi I'm back, you and other kotliner could watch this issue to modify JsonElement: nomisRev/kotlinx-serialization-jsonpath#69

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants