Use case or problem
(1) breaking between phrases:
So, here’s a sample Chinese sentence: “这是一句没有任何空格的中文句子示例”
If you paste this sentence into Chrome searchbar then press ctrl + backspace
, you’ll see that “示例” at the end of the sentence gets deleted, which is a phrase. This behaviour is consistent with pressing ctrl + backspace in English sentences.
If you copy the same sentence, then do the same ctrl + backspace
in obsidian(v0.12.15), the entire sentence would get deleted, instead of a phrase.
The same thing applies to ctrl+left/right as well.
When using obsidian with some languages, etc. Chinese or Japanese, that doesn’t break phrases on space, the above problem will happen.
(2) breaking between languages:
The current obsidian does not break phrases between characters of different languages.
Press ctrl + backspace
in sentence like “这是一句没有任何空格的中文句子Sample” with obsidian would delete the entire sentence, where only the word “Sample” should be deleted.
Proposed solution
Adding Chinese/ Japanese tokenization support.
Press ctrl + backspace
at the end of a Chinese / Japanese sentence should delete the last phrase, instead of the entire sentence.
Libraries such as jieba(for Chinese) and Konoha (for Japanese) might be useful.
Current workaround (optional)
None. press backspace manually for multiple times to delete a phrase
Related feature requests (optional)
https://forum.obsidian.md/t/chinese-word-segmentation-should-also-be-supported-in-the-editor/13388