Skip to content

[RFC] Specifying how terminals process Unicode text #8533

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kovidgoyal opened this issue Apr 12, 2025 · 0 comments
Open

[RFC] Specifying how terminals process Unicode text #8533

kovidgoyal opened this issue Apr 12, 2025 · 0 comments

Comments

@kovidgoyal
Copy link
Owner

kovidgoyal commented Apr 12, 2025

Hi all,

Following on from the text-sizing work in #8226 I have decided to specify the exact algorithm terminals should use to split Unicode text into cells and implement it in kitty. It is based on the Unicode specification's Grapheme segmentation rules but in addition also specifies things particular to terminals not covered in the Unicode spec. It fixes various long standing issues such as #3810 (emoji with zwj) and #8433 (Korean text).

The specification is here. Feel free to read it and comment. There might well be differing opinions on the parts not covered by the Unicode spec. I am open to suggestions for modification.

From kitty users, I would appreciate if some of you can run nightly and report if there are any issues. It's possible you will have issues if you use ZWJ based emoji in your workflows, as the width kitty assigns to these has changed, and terminal programs may use a different width than the correct one.

In master, there is also a kitten that can be run easily to test a terminal's compliance with the spec. It uses grapheme test data from the Unicode consortium. Run it as:

kitten __width_test__

Here are results of running it on various terminals.

Terminal name Number of tests failed
kitty (master) 0
kitty 0.41.1 45
wezterm 5046fc22 179
foot 1.21.0 186
konsole 24.12.3 280
iTerm2 3.5.13 289
gnome-term 3.56.0 317
kitty-master+tmux-3.5 347
xterm 397 371
Apple terminal 2.14 479

And finally, we have ghostty 1.1.3, on which the test kitten failed to run because ghostty returned way more cursor position reports than were expected, something badly broken there. I did happen to look at its code as it claims to do grapheme segmentation, and it doesn't implement the segmentation algorithm correctly anyway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant