Handle filename "casing", "illegal" characters consistently across OS’es

Use case or problem

Files with perfectly legal characters and different “casing” aren’t handled consistently in Obsidian.

Example

We have two files, normally created within Obsidian:

  • Test links upper- and lowercase.md, and
  • Test Links upper- and lowercase.md.

We can create and access them just fine in Obsidian, but

  • all of the links below point to Test links upper- and lowercase.md,
  • and the graph view also shows just Test links upper- and lowercase.
[[Testing/Test links upper- and lowercase]]

[[Testing/Test Links upper- and lowercase]]

[Testing/Test links upper- and lowercase](<Testing/Test links upper- and lowercase.md>)

[Testing/Test Links upper- and lowercase](<Testing/Test Links upper- and lowercase.md>)

I regard this as a bug. Differently “cased” filenames are different, at least on Operating Systems like Linux.

I know that MacOS as well as Windows put (rather flakey) file APIs on top, so it “looks like” case-insensitivity, even on the case-sensitive file systems they support, supposedly to make their OS’s compatible with older stuff.

Proposed solution

In my opinion Obsidian should

  • either reduce their file API to a “common set” that works on all supported OS’es (i.e., case-insensitive file names, restrict common “illegal” characters)
  • or allow users the full capacity of their respective OS’es (like file name case-sensitivity and all otherwise “restricted” characters in file names).

Thinking of syncing vaults around machines with different OS’es, the former might be a better general solution, but I’d still prefer the latter, maybe with a little caution regarding the machinery being synced to.

Current workaround (optional)

Be aware of what I’m doing, generally trying to avoid filenames with “conflict potential”. (Though it’s hard on Linux if one is used to be able to do everything. For syncing with Windows, I tend to forget they don’t want :, ? and different casing.)

Related feature requests (optional)

13 Likes

Agree, except that I think it should be a user controlled toggle.

I’m long used to doing the first, but most people aren’t aware of restrictions on OSs they have never used. Until they hit a problem when they do. Or use a program that imposes those limits.

1 Like

I do agree that some solution would be good to be found to the file name casing problem. But just out of curiosity, I’d like to ask, in which kind of practical situation would you end up having two notes that have the same name but different casing? :slight_smile:

Such things happen. From my experience, mainly

  • by mistake (i.e., creating the same note manually again with different casing, say “Human” and “human”)
  • by importing/automatic data processing—this can create a lot of similar notes with different casing
  • by creating a note in Linux (which allows almost anything in a filename), and having that later processed on a filesystem with different filename casing (FAT, for instamce)
  • by using “illegal” (for other OS/filesystem) characters and having these automatically removed on save (ok, not casing, but “illegal character” problem) one might end up with a dup.

Just some examples that have happened.

I still say I could live with a consistent system to be used across OSes and file systems, since normally you’d avoid such kind of duplicates.

2 Likes

Obsi should already be doing this for characters:

So maybe this topic is specific to casing?

Regarding the characters, we are OS specific now (and some people don’t like that either now that we switched, but that’s what most people wanted!).

Regarding casing, if you are within obsidian, the link creation tool will help reuse a filename you already have.
If you create a file with a different casing outside obsidian, What can/should we do about it? It’s your files, you can see it and merge them as you see fit.

Maybe we can add a popup warning and invite you to take action.

2 Likes

It’s always difficult when more than one OS or filesystem comes into play.

I must say, I quite like @Dor’s idea of making it a setting (probably defaulting to an OS-agnostic style). In my opinion, this should handle both situations we can get: casing and special characters.

I realize this might be some programming effort, since it would require an extra layer in between Obsidian and the filesystem routines that needed to respect the setting. But anchoring it there might also solve follow-up problems, like finding if a link or a file name is “the same”.

As you can see above, one can create different case file names now within Obsidian. If such a hypothetical switch was set to “compatible mode”, I’d expect Obsidian to see “this file already exists”. With the other setting, I’d expect these two files to be legal, and the links and graph respect that they are different files.

And yes, I see the problem should a user start out with the “everything allowed” setting, later buy Sync, use a machine with another OS/filesystem and decide to “flip” the setting. Who or what would then go and do a sanity check (i.e., find all files that are now considered “the same” because of casing, and weren’t before)?

How? you can do it if purposely ignore link suggestion and write a file with a different case.

The solution to all problems cannot be “implement both and add a toggle”. We have 13 toggles for Sync, 13!!!

Which is, by chance, my workflow: I create files first, jot down my notes, and link them later. So it can happen that one (accidentally) creates the same file with different casing.

This would have a multitude of Obsidian Sync benefits, so I am tagging this as Sync and adding the valuable tag. Also, my vote to this issue as well.

The obsidian support pointed me to this Feature request.

I’ve created a file with a questionmark “?” and synced this from linux to android, where there was an error, that questionmarks are not allowed.

I was confused, as this looked like a bug, so I opened a support ticket. But it turns out, that some users want this behavior if they only use a single OS and want to use its “full potential”.

My recommendation here is to add an option in the sync plugin to add a warning if an os-idependant illegal character was used. Then you can still decide, the whole sync process does not need to get re-programmed and the warning can be disabled optionally. I would add a warning right when you are renaming, just in yellow color and a check inside the setting, where you can get a list of existing conflicting files (if you created files earlier and were not aware of this thing). I would default this to on, as it helps preventing issues in first place.

2 Likes

I’ve just synced my notes from Mac to Windows and wasn’t able to pull the notes because of ? and " characters in filenames.

I thought that this was a bug that Obsidian named the file, which will not work on other OS.

It would be great to allow only filenames that work on all OS, or give a possibility to choose Sync safe file naming. Another idea, is to allow to the transformation of filenames on save - like cast all file names from header to slug or something like that.

2 Likes

Seconding this issue, though there could be different ways to handle it:

A. A binary toggle between everything allowed by the current OS and compatibility with every (major ?) OS.
B. Several toggles for each OS a user wants to keep compatibility with (so people using mac and linux but not windows can use “?” and “:”).
C. A user-definable character denylist and case handling.
D. Allow users to set titles that are not the same as the filename, and prompt the user to use this feature when they enter a filename with a denylisted character.

Personally option B seems like a nice balance between avoiding compability issues and limiting scope.

1 Like

My thinking aloud (open to corrections and suggestions):

The Problem
Just because I’m paranoid doesn’t mean they aren’t after me

Sanitised filenames (with no blank spaces, upper case letters, special characters etc.) are a safe bet.

The alternative can come back to bite you, especially when things become less-than-ideal (software, hardware, and network bugs). Syncing across multiple computers and locations only increases the chances of problems.

In other words: things going “less-than-ideal” is not a matter of “if”, but “when”.

Potential Solutions
No good deed goes unpunished

One way to create sanitised filenames is to add some sort of metadata field into every .md text file that would be interpreted as the “user-visible note title” (different than the actual filename).

However, if Obsidian were to list the nice user-visible note titles, it would need to read every .md file to get that data.
Alternatively, it would need to save some index (and update it regularly upon every title or filename change).

The first option kills speed, the second option kills simplicity.

A third option would be to list filenames (in this case they would be sanitised), but enable tooltips on mouse hover (for those who think smartphones and ipads are a good idea… well… reconsider your life choices LOL - joking).

Finally, this feature could be implemented as a “core plugin”, so that users can enable or disable it based on their preferences.

Looking Ahead
This year will have been better than the next!

I’ve read that Obsidian will introduce “Multiplayer” soon:

“Share notes and edit them collaboratively.”

It looks like a promising feature, making Obsidian challenge Notion at what Notion is still better at (while hopefully, if done right, preserving all the pros of Obsidian).

But that could cause major problems if filename sanitisation isn’t handled properly.

Relja

1 Like

+1 to this, I was caught unawares that unsanitized file names could cause issues with Sync. This is not documented clearly and leads to pernicious battery drain. I ignored the error messages because I didn’t realize this would cause Sync issues and they were not critical files otherwise. Would love for this to be revisited.

Here might be a helpful shell script for renaming, if anyone comes across this issue:

#!/bin/bash

# =============================================================================
# Filename Character Fix Script
# =============================================================================
# 
# PURPOSE:
#   Finds and optionally renames files with problematic characters that may 
#   cause sync issues in cloud storage services or version control systems.
#   
# COMMON PROBLEMATIC CHARACTERS:
#   - Colons (:) - Issues on Windows/OneDrive
#   - Square brackets ([]) - Issues with some markdown parsers
#   - Question marks (?) - Reserved on Windows
#   - Angle brackets (<>) - Reserved on Windows
#   - Pipe (|) - Reserved on Windows
#   - Asterisk (*) - Wildcard character
#   - Double quotes (") - String delimiter issues
#
# USAGE:
#   ./fix_filename_characters.sh [OPTIONS]
#
# OPTIONS:
#   -d, --directory DIR     Directory to scan (default: current directory)
#   -e, --extension EXT     File extension to filter (e.g., "md", "txt")
#   -f, --fix              Actually rename files (dry-run by default)
#   -r, --recursive        Scan recursively through subdirectories
#   -c, --chars CHARS      Custom characters to fix (default: ":[]?<>|*\"")
#   -l, --log FILE         Log file path (default: /tmp/filename_fixes_TIMESTAMP.log)
#   -h, --help             Show this help message
#
# EXAMPLES:
#   # Dry run - show what would be renamed in current directory
#   ./fix_filename_characters.sh
#
#   # Fix markdown files in a specific directory
#   ./fix_filename_characters.sh -d ~/Documents/notes -e md --fix
#
#   # Recursively scan and fix all files with custom problematic characters
#   ./fix_filename_characters.sh -r -c ":@#" --fix
#
#   # Scan Obsidian vault for problematic markdown files
#   ./fix_filename_characters.sh -d ~/obsidian -e md -r
#
# =============================================================================

set -e

# Default values
SCAN_DIR="."
FILE_EXTENSION=""
FIX_MODE=false
RECURSIVE=false
PROBLEMATIC_CHARS=':[]?<>|*"'
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
LOG_FILE="/tmp/filename_fixes_${TIMESTAMP}.log"
EXCLUDE_DIRS=(".git" ".obsidian" "node_modules" ".vscode")

# Color codes for output
RED='\033[0;31m'
GREEN='\033[0;32m'
YELLOW='\033[1;33m'
BLUE='\033[0;34m'
NC='\033[0m' # No Color

# Function to display help
show_help() {
    sed -n '/^# =/,/^# =/p' "$0" | grep -v '^#!/bin/bash' | sed 's/^# //'
    exit 0
}

# Function to log messages
log_message() {
    local level="$1"
    local message="$2"
    local timestamp=$(date '+%Y-%m-%d %H:%M:%S')
    
    case "$level" in
        ERROR)   echo -e "${RED}[ERROR]${NC} $message" | tee -a "$LOG_FILE" ;;
        WARNING) echo -e "${YELLOW}[WARN]${NC} $message" | tee -a "$LOG_FILE" ;;
        SUCCESS) echo -e "${GREEN}[OK]${NC} $message" | tee -a "$LOG_FILE" ;;
        INFO)    echo -e "${BLUE}[INFO]${NC} $message" | tee -a "$LOG_FILE" ;;
        *)       echo "$message" | tee -a "$LOG_FILE" ;;
    esac
    
    echo "[$timestamp] $level: $message" >> "$LOG_FILE"
}

# Function to create regex pattern from character list
create_regex_pattern() {
    local chars="$1"
    local pattern=""
    
    # Escape special regex characters and build pattern
    for (( i=0; i<${#chars}; i++ )); do
        char="${chars:$i:1}"
        case "$char" in
            '[')  pattern="${pattern}\\[" ;;
            ']')  pattern="${pattern}\\]" ;;
            '?')  pattern="${pattern}\\?" ;;
            '*')  pattern="${pattern}\\*" ;;
            '|')  pattern="${pattern}\\|" ;;
            '(')  pattern="${pattern}\\(" ;;
            ')')  pattern="${pattern}\\)" ;;
            '{')  pattern="${pattern}\\{" ;;
            '}')  pattern="${pattern}\\}" ;;
            '.')  pattern="${pattern}\\." ;;
            '^')  pattern="${pattern}\\^" ;;
            '$')  pattern="${pattern}\\$" ;;
            '\\') pattern="${pattern}\\\\" ;;
            '"')  pattern="${pattern}\"" ;;
            *)    pattern="${pattern}${char}" ;;
        esac
    done
    
    echo "[$pattern]"
}

# Function to clean filename
clean_filename() {
    local filename="$1"
    local chars="$2"
    local new_filename="$filename"
    
    # Define replacement rules for each character
    for (( i=0; i<${#chars}; i++ )); do
        char="${chars:$i:1}"
        case "$char" in
            ':')  new_filename="${new_filename//:/—}" ;;      # Colon to em dash
            '[')  new_filename="${new_filename//\[/}" ;;      # Remove opening bracket
            ']')  new_filename="${new_filename//\]/}" ;;      # Remove closing bracket
            '?')  new_filename="${new_filename//\?/-}" ;;     # Question mark to dash
            '<')  new_filename="${new_filename//</}" ;;       # Remove less than
            '>')  new_filename="${new_filename//>/}" ;;       # Remove greater than
            '|')  new_filename="${new_filename//|/-}" ;;      # Pipe to dash
            '*')  new_filename="${new_filename//\*/-}" ;;     # Asterisk to dash
            '"')  new_filename="${new_filename//\"/\'}" ;;    # Double quote to single
            '@')  new_filename="${new_filename//@/at}" ;;     # At symbol to "at"
            '#')  new_filename="${new_filename//#/-}" ;;      # Hash to dash
            '&')  new_filename="${new_filename//&/and}" ;;    # Ampersand to "and"
            *)    new_filename="${new_filename//${char}/-}" ;; # Default: replace with dash
        esac
    done
    
    # Clean up multiple consecutive spaces or dashes
    new_filename=$(echo "$new_filename" | sed 's/  */ /g' | sed 's/--*/-/g')
    
    # Trim leading/trailing spaces and dashes
    new_filename=$(echo "$new_filename" | sed 's/^[ -]*//' | sed 's/[ -]*$//')
    
    echo "$new_filename"
}

# Function to check if path should be excluded
should_exclude() {
    local path="$1"
    
    for exclude_dir in "${EXCLUDE_DIRS[@]}"; do
        if [[ "$path" == *"/$exclude_dir/"* ]] || [[ "$path" == *"\\$exclude_dir\\"* ]]; then
            return 0
        fi
    done
    
    return 1
}

# Function to safely rename a file
rename_file() {
    local old_path="$1"
    local new_path="$2"
    
    if [ "$old_path" != "$new_path" ] && [ ! -e "$new_path" ]; then
        if [ "$FIX_MODE" = true ]; then
            mv "$old_path" "$new_path"
            log_message "SUCCESS" "Renamed: $(basename "$old_path") -> $(basename "$new_path")"
            return 0
        else
            log_message "INFO" "Would rename: $(basename "$old_path") -> $(basename "$new_path")"
            return 0
        fi
    elif [ "$old_path" != "$new_path" ] && [ -e "$new_path" ]; then
        log_message "WARNING" "Conflict: $(basename "$new_path") already exists - skipping"
        return 1
    fi
    
    return 0
}

# Parse command line arguments
while [[ $# -gt 0 ]]; do
    case $1 in
        -d|--directory)
            SCAN_DIR="$2"
            shift 2
            ;;
        -e|--extension)
            FILE_EXTENSION="$2"
            shift 2
            ;;
        -f|--fix)
            FIX_MODE=true
            shift
            ;;
        -r|--recursive)
            RECURSIVE=true
            shift
            ;;
        -c|--chars)
            PROBLEMATIC_CHARS="$2"
            shift 2
            ;;
        -l|--log)
            LOG_FILE="$2"
            shift 2
            ;;
        -h|--help)
            show_help
            ;;
        *)
            echo "Unknown option: $1"
            echo "Use -h or --help for usage information"
            exit 1
            ;;
    esac
done

# Validate directory
if [ ! -d "$SCAN_DIR" ]; then
    echo -e "${RED}Error: Directory '$SCAN_DIR' does not exist${NC}"
    exit 1
fi

# Convert to absolute path
SCAN_DIR=$(cd "$SCAN_DIR" && pwd)

# Create log file directory if it doesn't exist
LOG_DIR=$(dirname "$LOG_FILE")
mkdir -p "$LOG_DIR"

# Header
echo "=============================================="
echo "     Filename Character Fix Script"
echo "=============================================="
echo
log_message "INFO" "Starting scan of: $SCAN_DIR"
log_message "INFO" "Problematic characters: $PROBLEMATIC_CHARS"
log_message "INFO" "Mode: $([ "$FIX_MODE" = true ] && echo "FIX" || echo "DRY RUN")"
log_message "INFO" "Recursive: $RECURSIVE"
[ -n "$FILE_EXTENSION" ] && log_message "INFO" "File extension filter: .$FILE_EXTENSION"
echo

# Build find command
FIND_CMD="find \"$SCAN_DIR\""

# Add max depth if not recursive
if [ "$RECURSIVE" = false ]; then
    FIND_CMD="$FIND_CMD -maxdepth 1"
fi

FIND_CMD="$FIND_CMD -type f"

# Add extension filter if specified
if [ -n "$FILE_EXTENSION" ]; then
    FIND_CMD="$FIND_CMD -name \"*.$FILE_EXTENSION\""
fi

# Create regex pattern for grep - simpler approach
# Just check if any of the problematic characters exist
REGEX_PATTERN=$(echo "$PROBLEMATIC_CHARS" | sed 's/\]/\\]/g' | sed 's/\[/\\[/g' | sed 's/\*/\\*/g' | sed 's/\?/\\?/g' | sed 's/|/\\|/g')

# Counters
total_files=0
problematic_files=0
renamed_files=0
conflict_files=0
skipped_files=0

# Process files
log_message "INFO" "Scanning for files with problematic characters..."
echo

while IFS= read -r file; do
    # Skip excluded directories
    if should_exclude "$file"; then
        continue
    fi
    
    ((total_files++))
    
    filename=$(basename "$file")
    dirname=$(dirname "$file")
    
    # Check if filename contains problematic characters
    # Use a simple check for each character
    has_problematic=false
    for (( j=0; j<${#PROBLEMATIC_CHARS}; j++ )); do
        check_char="${PROBLEMATIC_CHARS:$j:1}"
        if [[ "$filename" == *"$check_char"* ]]; then
            has_problematic=true
            break
        fi
    done
    
    if [ "$has_problematic" = true ]; then
        ((problematic_files++))
        
        log_message "INFO" "Found: $filename"
        
        # Create cleaned filename
        new_filename=$(clean_filename "$filename" "$PROBLEMATIC_CHARS")
        new_path="$dirname/$new_filename"
        
        if [ "$filename" != "$new_filename" ]; then
            log_message "INFO" "  -> $new_filename"
            
            # Attempt to rename
            if rename_file "$file" "$new_path"; then
                ((renamed_files++))
            else
                ((conflict_files++))
            fi
        else
            ((skipped_files++))
        fi
    fi
done < <(eval "$FIND_CMD")

# Summary
echo
echo "=============================================="
echo "                 SUMMARY"
echo "=============================================="
log_message "INFO" "Total files scanned: $total_files"
log_message "INFO" "Files with problematic characters: $problematic_files"

if [ "$FIX_MODE" = true ]; then
    log_message "SUCCESS" "Files renamed: $renamed_files"
    [ $conflict_files -gt 0 ] && log_message "WARNING" "Files with conflicts: $conflict_files"
    [ $skipped_files -gt 0 ] && log_message "INFO" "Files skipped: $skipped_files"
else
    echo
    echo -e "${YELLOW}This was a DRY RUN. No files were actually renamed.${NC}"
    echo -e "To actually rename files, run with ${GREEN}--fix${NC} flag:"
    echo -e "  ${BLUE}$0 -d \"$SCAN_DIR\" --fix${NC}"
fi

echo
echo "Full log saved to: $LOG_FILE"
echo

# Show sample of problematic files if any remain
if [ "$FIX_MODE" = false ] && [ $problematic_files -gt 0 ]; then
    echo "Sample of files that would be renamed (max 10):"
    echo "================================================"
    
    eval "$FIND_CMD" | while IFS= read -r file; do
        if should_exclude "$file"; then
            continue
        fi
        
        filename=$(basename "$file")
        # Check if filename contains problematic characters
        has_problematic=false
        for (( j=0; j<${#PROBLEMATIC_CHARS}; j++ )); do
            check_char="${PROBLEMATIC_CHARS:$j:1}"
            if [[ "$filename" == *"$check_char"* ]]; then
                has_problematic=true
                break
            fi
        done
        
        if [ "$has_problematic" = true ]; then
            echo "  • $filename"
        fi
    done | head -10
fi

exit 0
2 Likes

Some kind of warning when naming files, at least, would be good. I think there may be a standalone request for that (or maybe I’m confusing it with one for duplicate names).

Slightly edited the original request to remove OS judgment.

1 Like

Hey

I think having a toggle, or multiple ones for different OSs, in Obsidian settings as opt-in-by-default would be good; we then can follow a set of system conventions (other than the current OS of course) for future file names, and upon toggling do a quick search on files to find invalid ones and prompt the user to update them.