[RFC] Specifying how terminals process Unicode text #8533

kovidgoyal · 2025-04-12T09:13:32Z

Hi all,

Following on from the text-sizing work in #8226 I have decided to specify the exact algorithm terminals should use to split Unicode text into cells and implement it in kitty. It is based on the Unicode specification's Grapheme segmentation rules but in addition also specifies things particular to terminals not covered in the Unicode spec. It fixes various long standing issues such as #3810 (emoji with zwj) and #8433 (Korean text).

The specification is here. Feel free to read it and comment. There might well be differing opinions on the parts not covered by the Unicode spec. I am open to suggestions for modification.

From kitty users, I would appreciate if some of you can run nightly and report if there are any issues. It's possible you will have issues if you use ZWJ based emoji in your workflows, as the width kitty assigns to these has changed, and terminal programs may use a different width than the correct one.

In master, there is also a kitten that can be run easily to test a terminal's compliance with the spec. It uses grapheme test data from the Unicode consortium. Run it as:

kitten __width_test__

Here are results of running it on various terminals.

Terminal name	Number of tests failed
kitty (master)	0
kitty 0.41.1	45
wezterm 5046fc22	179
foot 1.21.0	186
konsole 24.12.3	280
iTerm2 3.5.13	289
gnome-term 3.56.0	317
kitty-master+tmux-3.5	347
xterm 397	371
Apple terminal 2.14	479

And finally, we have ghostty 1.1.3, on which the test kitten failed to run because ghostty returned way more cursor position reports than were expected, something badly broken there. I did happen to look at its code as it claims to do grapheme segmentation, and it doesn't implement the segmentation algorithm correctly anyway.

The text was updated successfully, but these errors were encountered:

kovidgoyal pinned this issue Apr 12, 2025

This was referenced Apr 12, 2025

cursor location seems incorrect following zero-width-joined emoji #3810

Closed

Correctly render Hangul in NFD as NFC - supporting Jamo normalisation #8433

Closed

terminal: Implement grapheme clustering mode (DECSET 2027) coletrammer/ttx#11

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] Specifying how terminals process Unicode text #8533

[RFC] Specifying how terminals process Unicode text #8533

kovidgoyal commented Apr 12, 2025 •

edited

Loading

[RFC] Specifying how terminals process Unicode text #8533

[RFC] Specifying how terminals process Unicode text #8533

Comments

kovidgoyal commented Apr 12, 2025 • edited Loading

kovidgoyal commented Apr 12, 2025 •

edited

Loading