Better word boundary implementation (better double clicking)

windymelt · April 27, 2023, 11:09am

Hi, I am a Japanese user of Obsidian.
This is a proposal to enhance Obsidian’s double-click behavior, i.e., word boundary boundary detection.

Use case or problem

The premise is that Japanese, unlike English and other languages, does not use spaces for word boundaries (this includes Chinese, Korean, etc.). In addition, as is customary in Japanese, English words are sometimes interspersed with Japanese words, which also do not use spaces at the boundaries between words.

Example:

もし月着陸に成功していれば、宇宙開発への商業的な関与において「大変革」となっていただろうとBBCに話した。
(He told the BBC that if he had succeeded in landing on the moon, it would have been a “sea change” in commercial involvement in space exploration.)
from 日本企業の月着陸船、月面に衝突か　ispace - BBCニュース

In the current implementation of Obsidian, the word boundary detection algorithm used for double-click word selection seems to look for the most recent space or punctuation mark. Unfortunately, this behavior is difficult to use for Japanese users, since the range selected by this behavior covers almost all sentences, and it does not work correctly even when mixed with English words.

Proposed solution

On the other hand, some well-known browsers seem to have excellent algorithms for detecting word boundaries. Fortunately, this algorithm is independent as a library, and I believe it can be used in Obsidian. I am not familiar with Obsidian’s implementation language, but if your implementation language is C/C++/Java, you can use ICU’s library directly. If your implementation language is JavaScript, you can use the ICU library indirectly by using Intl.Segmenter .

Current workaround (optional)

There is no workaround for this.

Related feature requests (optional)

dave888 · November 11, 2023, 4:36pm

This would also be very useful for a particular English use case - selecting decimal numbers. Chances are, if you double click on these in your browser, you’ll select the whole number: 1.23, or 56.332. But that doesn’t work in Obsidian on Macos. If you have to do it a lot, double click is much faster than click and drag.