Vault corruption - ansi characters in filenames

It appears that Obsidian is creating files with non-unicode characters in filenames.
As a result, the filenames which contain non-English characters can get corrupted when the vault is opened on another computer which has a different codepage configured.
Subsequently, the markdown links stop working.

Steps to reproduce

  1. Configure the “Language for non-Unicode applications” in Windows 10 to e.g. Polish (Poland) - system restart will be required.
    More info here:
    Setting a Language for Non-Unicode Applications | TestLeft Documentation

  1. Use a non-English keyboard layout e.g. Polish (Programmers)
    image

  2. Create notes in your vault which contain non-English chartacters in their filenames.
    In Polish keyboard layout you could use the following letters with the right Alt key pressed: (A C E S Z L)

  3. Zip your vault using Windows built-in tool “Send to > Compressed (zipped) folder”

  4. Transfer your zipped vault to another computer which has the “Language for non-Unicode applications” set to “English (United States)”.
    Alternatively, you could use the same computer as in step 1 and modify its codepage (please note another system restart would be required).
    image

  5. Unzip your vault - the filenames (and markdown links) will get corrupted at that point.

Expected result

The filenames should remain unmodified and the links within the vault should continue to be operational.

Actual result

The filenames and links get corrupted.

Environment

Windows 10, Obsidian v.0.13.14 no plugins (safe mode).


Additional information

Please find the screenshots below and sample zip file in the attached.

image

image

test02.zip (1.8 KB)

I am skeptical about this report. This sounds more like a windows problem.
How do you know it’s not the windows zipper who is creating this problem?
What filesystem is your vault saved in?

Filesystem is NTFS on both machines.
The same thing happens when 7-Zip is used for compressing.

The issue occurs only when Obsidian is used for creating the files.

I mean, all is fine when such filenames are created with Notepad++, Notepad, MS Office or Libre Office etc.

IF you use NTFS, this problem should not happen because NTFS stores filenames as UTF16 not ANSI+codepage so that windows settings you are changing should have no effect on NTFS files.

If this is codepage issue, the simple act of changing that settings and rebooting should change the way those files appear (without involving the zipper).
I think the problem is the zipper because .zip doesn’t support unicode until recent versions.

There is actually more.
Somehow Obsidian has managed to also create a file containing a forbidden character in its filename - please find the screenshot below (gets displayed when trying “Send To > Compressed (zipped) folder”).

I don’t know yet how to reproduce that one though.

And I am not disagreeing with you regarding NTFS and Unicode, yet here we are.
I trust the original issue should be fully reproducible.

I will try as per your suggestion (without .zip) and update here.

image

1 Like

You are right - it does work correctly when zip is not used.

More than that - I was actually wrong in assuming the issue is limited to filenames created with Obisdian, apologies. I have now managed to reproduce it with other tools too.

I guess the tangential problem is the forbidden character(s) as per last screenshot - but that should go into its own issue if or when it can be reproduced.

Many thanks for your time.

2 Likes

This topic was automatically closed 24 hours after the last reply. New replies are no longer allowed.