You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following on from the text-sizing work in #8226 I have decided to specify the exact algorithm terminals should use to split Unicode text into cells and implement it in kitty. It is based on the Unicode specification's Grapheme segmentation rules but in addition also specifies things particular to terminals not covered in the Unicode spec. It fixes various long standing issues such as #3810 (emoji with zwj) and #8433 (Korean text).
The specification is here. Feel free to read it and comment. There might well be differing opinions on the parts not covered by the Unicode spec. I am open to suggestions for modification.
From kitty users, I would appreciate if some of you can run nightly and report if there are any issues. It's possible you will have issues if you use ZWJ based emoji in your workflows, as the width kitty assigns to these has changed, and terminal programs may use a different width than the correct one.
In master, there is also a kitten that can be run easily to test a terminal's compliance with the spec. It uses grapheme test data from the Unicode consortium. Run it as:
kitten __width_test__
Here are results of running it on various terminals.
Terminal name
Number of tests failed
kitty (master)
0
kitty 0.41.1
45
wezterm 5046fc22
179
foot 1.21.0
186
konsole 24.12.3
280
iTerm2 3.5.13
289
gnome-term 3.56.0
317
kitty-master+tmux-3.5
347
xterm 397
371
Apple terminal 2.14
479
And finally, we have ghostty 1.1.3, on which the test kitten failed to run because ghostty returned way more cursor position reports than were expected, something badly broken there. I did happen to look at its code as it claims to do grapheme segmentation, and it doesn't implement the segmentation algorithm correctly anyway.
The text was updated successfully, but these errors were encountered:
Hi all,
Following on from the text-sizing work in #8226 I have decided to specify the exact algorithm terminals should use to split Unicode text into cells and implement it in kitty. It is based on the Unicode specification's Grapheme segmentation rules but in addition also specifies things particular to terminals not covered in the Unicode spec. It fixes various long standing issues such as #3810 (emoji with zwj) and #8433 (Korean text).
The specification is here. Feel free to read it and comment. There might well be differing opinions on the parts not covered by the Unicode spec. I am open to suggestions for modification.
From kitty users, I would appreciate if some of you can run nightly and report if there are any issues. It's possible you will have issues if you use ZWJ based emoji in your workflows, as the width kitty assigns to these has changed, and terminal programs may use a different width than the correct one.
In master, there is also a kitten that can be run easily to test a terminal's compliance with the spec. It uses grapheme test data from the Unicode consortium. Run it as:
Here are results of running it on various terminals.
And finally, we have ghostty 1.1.3, on which the test kitten failed to run because ghostty returned way more cursor position reports than were expected, something badly broken there. I did happen to look at its code as it claims to do grapheme segmentation, and it doesn't implement the segmentation algorithm correctly anyway.
The text was updated successfully, but these errors were encountered: