How can I find all the appearances of the unicode that starts with "\u2f"?

Things I have tried

The problem that I meet is, when using Chinese, the unicodes on the left column look almost the same to the ones in the right column. And the ones on the right, which start with \u2f, are seldom used.

\u4e00::::\u2f00

\u7528::::\u2f64

\u751f::::\u2f63

\u65b9::::\u2f45

(the screenshot of the result of the unicode-to-Chinese conversion)

However, due to some reasons, OCR, I guess, some of the Chinese characters in my notes are actually wrong(i.e. they should be the characters whose unicode does not start with \u2f). It is not possible to identify them in the note, as when you reading the note, they just looks the same. However, when you do the search, that will be a huge problem, i.e. the one you searched and the one that you read in the note will never be matched.

And since there are a lot of characters, I cannot search and replace them base on the codebook.

What I’m trying to do

I think one possible way to fix this problem is to identify the appearance of the unicode start with \u2f in my vault in the first step. I think, optimistically, the total number of the appearance will not be large. And then, a manual fix can be applied to each of the appearances by delete the wrong character and re-input the correct character and probably in Obsidian to get rid of breaking the links.

My question is how can I accomplish the first step? i.e. to find all the appearance of the unicode starting with \u2f in the range of a folder including sub-folders(such as the vault).

Disclaimer: I don’t know any Chinese, so sorry if that makes parts of my answer confusing!

I do however know a little bit about unicode and regex, and that’s my basis for answering your question.

Match part or single characters

Firstly, it’s actually hard to match for only part of a multi-byte character, so it’s not easy to locate all the \u2f. To do such a search, you would need to get a binary editor with search capabilities, and I don’t think that’s the path you should go.

However, using regex you can search for character classes, and with a little bit of luck you can search for the entire (or a suitable) range of characters starting with \u2f.

Since I’m rather bad (read: totally blank) on Chinese characters, I’m going to give you an example using ordinary latin characters, and hope you’re able to translate into your use case.

Character class regex search

To get regex search, you need to search (Using cmd+shift+F (not only cmd+F (or Ctrl + … if on windows)) your entire vault (or limit using something like path:/YourFolder or other means), and then you can enter something like /[a-cru]/. This searches for a single character which needs to be in the character class, [...], of a to c, or r and u. In other words, either a, b, c, r or u.

It’ll now show all matching notes, and you can click on either note and it’ll show you all the characters matching your pattern. You can now either change them manually, or possibly do a search-and-replace for the various cases in that particular file.

I don’t think you can do a global search and replace within Obsidian, and I’m not sure from your description whether there is a many characters you need to search for, or just the four you mentioned.

Building your character class

You might already have understood how to build your search, but basically you do /[ to start the character class, and then enter either the various single characters you want to search for, or a range like the a-c where you replace the a with the first character in the range, aka something close \u2f00, and replace the c with something near the end of range, aka something close to \u2fff.

Finally, you end the range with ]/. Just for the sake of clarity the / at start and end denotes the regex, and the [ and ] mark the character class.

Hope this helps!

Brilliant idea! Thank you!
I will have a try and report the result. :slight_smile:

It works!

  1. I use Numbers to generate the string from \u2f00 to \u2fe1
  2. put the result into one cell
  3. remove the \t between them
\u2f00\u2f01\u2f02\u2f03\u2f04\u2f05\u2f06\u2f07\u2f08\u2f09\u2f0A\u2f0B\u2f0C\u2f0D\u2f0E\u2f0F\u2f10\u2f11\u2f12\u2f13\u2f14\u2f15\u2f16\u2f17\u2f18\u2f19\u2f1A\u2f1B\u2f1C\u2f1D\u2f1E\u2f1F\u2f20\u2f21\u2f22\u2f23\u2f24\u2f25\u2f26\u2f27\u2f28\u2f29\u2f2A\u2f2B\u2f2C\u2f2D\u2f2E\u2f2F\u2f30\u2f31\u2f32\u2f33\u2f34\u2f35\u2f36\u2f37\u2f38\u2f39\u2f3A\u2f3B\u2f3C\u2f3D\u2f3E\u2f3F\u2f40\u2f41\u2f42\u2f43\u2f44\u2f45\u2f46\u2f47\u2f48\u2f49\u2f4A\u2f4B\u2f4C\u2f4D\u2f4E\u2f4F\u2f50\u2f51\u2f52\u2f53\u2f54\u2f55\u2f56\u2f57\u2f58\u2f59\u2f5A\u2f5B\u2f5C\u2f5D\u2f5E\u2f5F\u2f60\u2f61\u2f62\u2f63\u2f64\u2f65\u2f66\u2f67\u2f68\u2f69\u2f6A\u2f6B\u2f6C\u2f6D\u2f6E\u2f6F\u2f70\u2f71\u2f72\u2f73\u2f74\u2f75\u2f76\u2f77\u2f78\u2f79\u2f7A\u2f7B\u2f7C\u2f7D\u2f7E\u2f7F\u2f80\u2f81\u2f82\u2f83\u2f84\u2f85\u2f86\u2f87\u2f88\u2f89\u2f8A\u2f8B\u2f8C\u2f8D\u2f8E\u2f8F\u2f90\u2f91\u2f92\u2f93\u2f94\u2f95\u2f96\u2f97\u2f98\u2f99\u2f9A\u2f9B\u2f9C\u2f9D\u2f9E\u2f9F\u2fA0\u2fA1\u2fA2\u2fA3\u2fA4\u2fA5\u2fA6\u2fA7\u2fA8\u2fA9\u2fAA\u2fAB\u2fAC\u2fAD\u2fAE\u2fAF\u2fB0\u2fB1\u2fB2\u2fB3\u2fB4\u2fB5\u2fB6\u2fB7\u2fB8\u2fB9\u2fBA\u2fBB\u2fBC\u2fBD\u2fBE\u2fBF\u2fC0\u2fC1\u2fC2\u2fC3\u2fC4\u2fC5\u2fC6\u2fC7\u2fC8\u2fC9\u2fCA\u2fCB\u2fCC\u2fCD\u2fCE\u2fCF\u2fD0\u2fD1\u2fD2\u2fD3\u2fD4\u2fD5\u2fD6\u2fD7\u2fD8\u2fD9\u2fDA\u2fDB\u2fDC\u2fDD\u2fDE\u2fDF\u2fE0\u2fE1
  1. pasts the string in the cell onto a website which can convert unicode to Chinese.
⼀⼁⼂⼃⼄⼅⼆⼇⼈⼉⼊⼋⼌⼍⼎⼏⼐⼑⼒⼓⼔⼕⼖⼗⼘⼙⼚⼛⼜⼝⼞⼟⼠⼡⼢⼣⼤⼥⼦⼧⼨⼩⼪⼫⼬⼭⼮⼯⼰⼱⼲⼳⼴⼵⼶⼷⼸⼹⼺⼻⼼⼽⼾⼿⽀⽁⽂⽃⽄⽅⽆⽇⽈⽉⽊⽋⽌⽍⽎⽏⽐⽑⽒⽓⽔⽕⽖⽗⽘⽙⽚⽛⽜⽝⽞⽟⽠⽡⽢⽣⽤⽥⽦⽧⽨⽩⽪⽫⽬⽭⽮⽯⽰⽱⽲⽳⽴⽵⽶⽷⽸⽹⽺⽻⽼⽽⽾⽿⾀⾁⾂⾃⾄⾅⾆⾇⾈⾉⾊⾋⾌⾍⾎⾏⾐⾑⾒⾓⾔⾕⾖⾗⾘⾙⾚⾛⾜⾝⾞⾟⾠⾡⾢⾣⾤⾥⾦⾧⾨⾩⾪⾫⾬⾭⾮⾯⾰⾱⾲⾳⾴⾵⾶⾷⾸⾹⾺⾻⾼⾽⾾⾿⿀⿁⿂⿃⿄⿅⿆⿇⿈⿉⿊⿋⿌⿍⿎⿏⿐⿑⿒⿓⿔⿕
  1. put the mess generated above in to Obsidian search pane, using regex.
/[⼀⼁⼂⼃⼄⼅⼆⼇⼈⼉⼊⼋⼌⼍⼎⼏⼐⼑⼒⼓⼔⼕⼖⼗⼘⼙⼚⼛⼜⼝⼞⼟⼠⼡⼢⼣⼤⼥⼦⼧⼨⼩⼪⼫⼬⼭⼮⼯⼰⼱⼲⼳⼴⼵⼶⼷⼸⼹⼺⼻⼼⼽⼾⼿⽀⽁⽂⽃⽄⽅⽆⽇⽈⽉⽊⽋⽌⽍⽎⽏⽐⽑⽒⽓⽔⽕⽖⽗⽘⽙⽚⽛⽜⽝⽞⽟⽠⽡⽢⽣⽤⽥⽦⽧⽨⽩⽪⽫⽬⽭⽮⽯⽰⽱⽲⽳⽴⽵⽶⽷⽸⽹⽺⽻⽼⽽⽾⽿⾀⾁⾂⾃⾄⾅⾆⾇⾈⾉⾊⾋⾌⾍⾎⾏⾐⾑⾒⾓⾔⾕⾖⾗⾘⾙⾚⾛⾜⾝⾞⾟⾠⾡⾢⾣⾤⾥⾦⾧⾨⾩⾪⾫⾬⾭⾮⾯⾰⾱⾲⾳⾴⾵⾶⾷⾸⾹⾺⾻⾼⾽⾾⾿⿀⿁⿂⿃⿄⿅⿆⿇⿈⿉⿊⿋⿌⿍⿎⿏⿐⿑⿒⿓⿔⿕]*/
  1. Here is the result that I have:

It seems that I’m lucky enough.
The problematic line that I ran into accidentally a few days ago is the only one in my vault. But since I will sometime use OCR to capture text from book, this trick will be useful and I will probably use it to detect errors in my vault periodically.

@holroy,
I really appreciate for your help. :smiley:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.