Unlinked mentions in Korean

In korean language, every noun has a postpostion to use in a sentense.

It’s like this,

커피를 마신다.

‘커피’ is noun and ‘를’ is postposition, and there could be no space between two.

So when you make ‘커피’ page, unlinked backlink doesn’t work because most word in sentences like ‘커피에’ ‘커피는’ ‘커피의’ ‘커피로’ has postposition followed by noun ‘커피’.

I hope you understand what I mean because my english is so broken.
Can you add an option for korean that allow making link one word from two words even if there is no space between them?

1 Like

Just have a suggestion for a workaround, You could use the alias feature to make the unlinked backlinks show up? It’s probably a bit too much work, but if you really needed it that could be one way.

We are thinking about this…

Our understanding is that Korean uses whitespaces to separate words like western languages.
Japanese and Chinese insted do not use spaces to separate words.

This post about postpositions confused us a bit and we are not sure anymore what is the best way to handle Korean. I will create a poll to decide what is the best choice.

Sorry, I accidentally voted.
I dabble a bit in Korean but am not fluent.

I’ll let other Korean speakers respond.
But from my limited understanding the proposed change should fix the issue of @sleepyblue

I guess using a list of Korean post positions and either add them or let them be added manually could also fix the issue:

Unlinked mention behavior for Korean
  • [Current Behaviour]: Unlinked mentions DO require a whitespace to get match
    A) Only exact ⌴커피⌴ will be matched by default by the unlinked mention algorithm. (⌴ = whitespace)
    B) Use aliases in the page 커피 if you want to handle postposition 커피를.
    aliases: [커피를]
  • [Proposed change]: Unlinked mentions DO NOT require a whitespace to get match
    A) Everything ***커피*** will be matched by default by the unlinked mention algorithm. (* stands for any other character)
    B) When you click on ***커피*** you will get ***[[커피]]***
    • No need for aliases for handling postposition 커피를.
    • When you click on 커피를 you will get [[커피]]를

0 voters

Yes that is it.
Since Postpositions are kind of declension affix, using aliases is too complicated. If 커피 can linked to 커피, it could be a great to use in korean.

Thank you for your help.

Oh no. I want to vote to proposed change. How can I cancel this? Sorry.

you have to understand that this will not just affect postposition, but everything. So choose what you think it’s best/makes sense for korean.

Hi, got here from Korean Discord Channel. I do think this is a complicated thing for the Korean Language.

One improvement I could suggest is instead of matching everything ***커피***, just match 커피*** since in the Korean language, propositions only come after the noun. This way it will have less mismatches where a word could be inside another word for example,
그네 (swing) and 나그네 (traveler/wanderer) . This won’t prevent all the mismatches, but I think it would prevent some.

There could be cases where extra text in front could specify a more broader topic such as, 나무(tree) and 소나무(pinetree), but I think in this case, backlinks shouldn’t be the same since they are different words.

Also, I think it would be nice if there was a way to remove the backlink suggestion if they are different words, but not sure if that’s in the scope of this discussion.

I’m sorry I’m not used to forum so I don’t know the process of feature fix. You made some vote, and it means you are going to consider this fix in your feature list?

I think the best way to solve this problem for Korean language is for the search algorithm (RegEx) to allow any Korean propositions to exist at the end of a string.

So currently, the problem is:
" 커피 " and " 커피를 " will not backlink because the search algorithm looks for " " (Space character) before and after the string (word).

The fix:
Allow the search algorithm to allow postpositions such as “를”, “도” etc. to exist at the end of a string (word) before the " " (space character).

The result:
“커피”, “커피는”, “커피가”, “커피도”, “커피를”, etc. will all match! (hopefully)

The search algorithm should allow for all of the propositions listed here:
Postposition Wiki Reference

I’m not sure if the above is a complete list. Maybe super Koreans can confirm. I’m a westernised Korean so I’m not sure :grimacing:

Not sure if Its a complete list, but seems to cover the major ones.

Propositions in Korean do come after noun, but I still think matching like @@커피@@ should be the way. It’s because there are so many compound words in Korean and there are no space in between. (and also many people are clumsy at spacing words)

Ex) 커피 = coffee
@@커피 : 케냐커피 (Keynan coffee), 아이스커피(Iced coffee)
커피@@ : 커피머신 (Coffee machine), 커피샵 (Coffee shop)

Unlinked backlinks could be kinda loose but better than loosing opportunity to link ideas. And by the way, many Koreans are already used to finding words in @@커피@@ style because word-processors in Korea work that way.

There is an open sourced NLP based CJK parser named Pororo at GitHub - kakaobrain/pororo: PORORO: Platform Of neuRal mOdels for natuRal language prOcessing

As it is under active development and maintenance by Kakao, one of the biggest IT company in South Korea, you may consider using this to analyze CJK languages and implement backlink feature for them.

Moreover, KoNLPy: Korean NLP in Python — KoNLPy 0.5.2 documentation KoNLPy is a well-known Korean NLP module which has a nice tokenizer.

Though I’m not a software developer myself, I believe these libraries can be used to develop a plugin like ‘CJK backlink’ or something like that. @WhiteNoise

We are not going to go down the path of an external tokenizer.

Chinese and Japanase already work, we will make the change for Korean.

1 Like

will be implemented in 0.11.1