Steps to reproduce
- Create a new vault.
- Create a new note.
- Paste or start typing text that contains Unicode text with codepoints greater than U-FFFF, such as
ππ¦π π¦π π© ππ§πππ(βthis is a testπβ with Latin English replaces with Shavian English) - Check the character and word counts.
Did you follow the troubleshooting guide? Y
Expected result
The character/word counter shows 3 words, 14 characters
Actual result
The character/word counter shows 0 words, 25 characters
Environment
SYSTEM INFO:
Operating system: android 16 (Google Pixel 8)
Webview version: 141.0.7390.111
Obsidian version: 1.9.14 (242)
API version: v1.9.14
Login status: not logged in
Language: en
Live preview: on
Base theme: adapt to system
Community theme: none
Snippets enabled: 0
Restricted mode: on
RECOMMENDATIONS:
none
Additional information
It appears that all of the Shavian letters and the emoji are being double counted, as there are 11 of them in that snippet, and 3 whitespace, and 2 * 11 + 3 = 25. Iβve also tested Hiragana, Hangul, Cyrillic, and the Greek alphabet. None of them have this problem.
There are also sometimes errors when editing in the middle of a body text that has/is almost entirely made up of these codepoints, where a character elsewhere in the buffer will become an invalid codepoint, usually ones that are next to a βnormalβ character.
The Shavian block is U+10450-1047F, and the Emoji block is U+1F600-1F64F.
When counting words in Shavian, a normal English word counter should work just fine, but replace the normal English (Latin) letters with Shavian letters.
IDK how Emojis should be handled for word counting.
While I donβt think itβs currently in use, I suspect there will be a similar problem for characters counting and text editing for 4-byte encoded text as well, but Iβve not tested that.