Editor (CM5) doesn't support word splitting in CJK languages

Currently, the default “double click to select the word” behavior is not available for CJK content in edit mode, since the built-in CodeMirror used for editing implement its own word splitting method that differs from regular browser behavior, which treats a whole line of text as a whole.

For example, when double clicks “河北省” in the Chinese sentence “我从河北省来。”, the word “河北省” should be selected. In preview mode, the double clicks work as expected, but in edit mode, the whole sentence is selected.

Current workaround

I wrote a plugin using registerCodeMirror to patch CodeMirror’s default behavior. It injects a patched version of the findWordAt method that does additional word splitting when a Chinese character is present. The js-based word splitting module is not as efficient as the one with native binaray, but adequate in this use case.

Link to plugin: https://github.com/alx-plugins/cm-chs-patch

However, since Obsidian is currently migrating from CodeMirror 5 to CodeMirror 6, this workaround may break at any time.

Related Pull Request: Add cm-chs-patch by AidenLx · Pull Request #229 · obsidianmd/obsidian-releases (github.com)

Proposed solution

Since CodeMirror is open-sourced, it is possible to use patched CodeMirror in future releases, especially considering that CodeMirror 6 is more modular. Using a more efficient module or the built-in engine in the system/browser would be welcomed for low-end devices.

PS:

Default
ob-default-splitting

Patched (Expected Behavior)
ob-patched-splitting

4 Likes

Thanks for your work implementing this.
I am gonna move this to Feature Requests and hope that CM6 will do a better job.