Search vault for duplicated files (files with the same name)

Exactly what I want

Do a vault wide search and display all the files that have exact same names.
Maybe generate a result .md file.
(Searching for same aliases could be an optional function)

Case scenario

This is mainly for debugging, as sync service bugs often produce duplicates, leading to tedious work merging them back.

Could also be useful when I forgot about an old file and created a new one with the same name unknowingly.

Workarounds

None that I can think of.

1 Like

Workarounds:

For example:

find . -mindepth 1 -path .obsidian -prune -path .git -prune  -printf '%h %f\n' \
  | sort -t ' ' -k 2,2 \
  | uniq -f 1 --all-repeated=separate \
  | tr ' ' '/'

I don’t know of a plugin that checks for duplicate names, but I use this handy dataviewjs query in a callout every so often to make sure I don’t have any that I don’t want https://github.com/claremacrae/obsidian-experiments-plugin/issues/1

@trainbuffer on Discord 5 months ago Discord

3 Likes

Neat solution.

I unmarked the solution so the thread will stay open because I think it’s still a good plugin idea. :slightly_smiling_face:

Just an observation, but you probably would want to compare based on some hash of the files versus just using the file name if you’re looking for true duplicates.

For future readers, this solution has stopped working. Helps would be greatly appreciated.

It works for me.
Of course I had to activate ‘enable dataview JavaScript queries’ and ‘enable dataview JavaScript inline queries’ settings on Dataview plugin as it was asked the first time I run that script.

This was useful for me, but I’d just like to add my modified version. Makes it a table, with columns for created date, modified date, file size, plus ingoing and outgoing links.

~~~dataviewjs
function listFileNameIssues(dv) {  
    let pages = dv.pages();  
    let groups = pages.groupBy(p => p.file.name.toLowerCase())  
    let tableRows = [];

    function formatBytes(bytes, decimals = 2) {
        if (bytes === 0) return '0 Bytes';
        const k = 1024;
        const dm = decimals < 0 ? 0 : decimals;
        const sizes = ['Bytes', 'KB', 'MB', 'GB', 'TB'];
        const i = Math.floor(Math.log(bytes) / Math.log(k));
        return parseFloat((bytes / Math.pow(k, i)).toFixed(dm)) + ' ' + sizes[i];
    }

    for (let group of groups) {  
        let count = 0  
        for (let page of group.rows.sort(p => p.file.path, 'asc')) {  
            count += 1  
        }  
  
        if (count === 1 ) {  
            continue  
        }  
  
        // Only process groups with duplicates
        for (let page of group.rows.sort(p => p.file.path, 'asc')) {  
            tableRows.push([
                page.file.link, 
                page.file.path, 
                page.file.cday.toISODate(), 
                page.file.mday.toISODate(),
                formatBytes(page.file.size),
                page.file.inlinks.length, 
                page.file.outlinks.length
            ]);
        }  
    }
    
    if (tableRows.length > 0) {
        dv.table(
            ["File Name", "Full Path", "Created Date", "Modified Date", "Size", "In-Links", "Out-Links"], 
            tableRows
        );
    } else {
        dv.paragraph("No duplicate file names found.");
    }
}

listFileNameIssues(dv);
~~~

I only have ‘Enable JavaScript queries’ enabled, don’t need inline, and the dataview JS keyword is dataviewjs.

1 Like

Nice idea! I’ve also run into the issue of duplicated files with the same name in my vault, really throws off the organization. Does anyone have a plugin or workflow for automatically detecting those duplicates (by name or content)? Would love to try something more automated instead of hunting them down manually.

I would still like to see a plugin for this, but here is the command line workaround I used to generate the list of duplicates (replace PATH with the path of your vault folder; not sure if it needs to end in a slash but I included it on mine).

find "VAULTPATH" -type f | grep -Eo '[^\/]+$' | sort | uniq -d

(The grep matches and prints only the part of each path that follows the last slash. The uniq -d prints only items that have duplicates.)

Then I put each .md one in Obsidian’s search (file:"example.md"), examined the results, and renamed, merged, or deleted as appropriate. (The non-.md ones are mostly parts of websites, which I’ll deal with later.)

1 Like

On my mobile I use app called Obsi which has AI assistant based on Gemini and it can finds conflicts and merge files AI-Powered Merge for Sync Conflicts

If you are using the rather wonderful Everything, then the following string will search for duplicates: !FileHistory\ file: dupe You can input the directory where your Vault lives for more granular searches, and also refine the searches by size: !FileHistory\ file: dupe: dupesize: size:>500mb (for files larger than 500mb.)