Get all uploaded files in Roam Research to Obsidian

OK, I’ve done the date changes now I’m, onto this one!!

How do I actually do this?

  1. I presume I navigate to the root of the vault?
  2. Do I then run each line as a command or as a .sh file?

Really sorry to keep hassling, but I’m not sure how to get started with this…

Thanks again!

Jim

They actually only gave you two commands. The first one is very long, so they split it up over multiple lines using backslashes, which tells the shell “don’t run it yet when I press Enter, I’m going to continue typing the command on the next line.” It would work as a script, or as a multiple-line command, or you could run it as a single-line command if you prefer by removing the trailing backslashes (but keep the vertical pipe characters, they’re important).

no worries at all, exactly what @blippybloopy said. these are just two commands. you can select the entire command, copy and paste them into your terminal. remember to back everything up first.

@pmbauer thanks for your work on this! Which shell did you write that block (your download ‘script’) in/for? Blindly pasted into zsh (the macOS default shell), it doesn’t parse correctly. It also doesn’t seem to work in bash, sh, or tcsh? (That’s after installing gsed from MacPorts and using it instead of BSD sed).

Edit: I ended up using:

mkdir -p _attachment/img

gsed -n 's/.*\[](\(https:\/\/firebasestorage.*\)).*/\1/gp' *.md */*.md | sort | uniq | gsed 's/.*%2F\(.*\)?alt.*/\1\t\0/' | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

Where gsed is GNU Sed, installed from MacPorts. I think there’s at least one spare | in the original script causing it to fail at the start but it took some playing with it and compressing to one line before zsh was happy.

Edit 2: May also want to do this for Roam PDF embeds.

gsed -n 's/.*pdf.*\(https:\/\/firebasestorage.*\).*/\1/gp' *.md */*.md | sort | uniq | gsed 's/.*%2F\(.*\)?alt.*/\1\t\0/' | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

gsed -i 's/{{pdf:.*https:\/\/firebasestorage.*%2F\(.*\)?alt.*/![[_attachment\/img\/\1]]/g' *.md */*.md

I definitely suggest using something like a find all in a text editor you’re familiar with, particularly if you’re going to be deleting the files from firebase after (you don’t want to leave anything behind). Just search for firebasestorage

it’s bash; as noted in the instructions, you’ll need a gnu userland

Mac’s default version of bash is pretty ancient. If you can get it to work on bash 3.2.57 (mac’s default) great. I’ve never tried.

I added an extra section for mac users. The script doesn’t have an extra | and should just work on linux. You will likely need to tweak some things for your mac

1 Like

Oh, that figures. I didn’t realize that was also horribly out of date. Cheers!

This looks like greek to me… Anyone can share how to do this in Windows?

you have an extra pipe in there! you can’t pipe mkdir to sed, you already had the && to connect them.

it also didn’t let me run that pipeline (on linux) until I stripped the comments out (and the extra whitespace after the backslash).

mkdir -p _attachment/img && \
  sed -n 's/.*\[](\(https:\/\/firebasestorage.*\)).*/\1/gp' *.md */*.md \
  | sort | uniq \
  | sed 's/.*%2F\(.*\)?alt.*/\1\t\0/' \
  | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

congrats, you found a typo, here’s a star :star:

thanks!

for some reason, I had a bunch of image files stored without a filename extension. this confused obsidian on my mac, and it didn’t end up syncing those files up. so of course I wrote some scripts to rename them all, to add the extension, and update references to them. after that, they synced.

remember to back up first (or use git)

first, I needed a list of them all (start this in the vault folder) … I looked at the files first to make sure they’re all .png or .jpg files in reality.

cd _attachment/img
ls |grep -v '\.' | xargs file --extension grep -Ev 'png|jpeg'

if that gives any output, you have some additional filetypes I didn’t, and you’ll have to adjust the perl script accordingly.

next I saved the list of filenames:

ls |grep -v '\.' > /tmp/list.txt
cd ../..

then I wrote a perl script to rename the file (using git mv ... because I have the vault tracked in git; if you don’t, take git out of the command in the script)

here’s how to run the script from the top folder of the vault (I called it ext.pl)

ext.pl < /tmp/list.txt

it took 3 minutes to run on my m1 mac, because I have over 7,000 files, and for every every one of the attachments without an extension, it checks every single file in the vault.

here’s the text of the perl script:

#!/usr/bin/perl -w

# takes an image file with no extension, runs it through file, then assumes it will be jpeg or png, updates all references
my $path = './_attachment/img/';
while ( $line = <> )
{
    chomp $line;
    my $fn = $line;
    chomp $fn;
    my $file = `file --extension $path$fn`;
    chomp $file;
    print "$file\n";
    $file =~ m/$fn: (png|jpeg)/;
    my $ext = $1;

    my $cmd = "find . -name \\*.md -print0 | xargs -0 perl -p -i -e \"s/$fn/$fn.$ext/g\"";
    print $cmd, "\n";
    system($cmd) == 0 or die "$?";
    my $mv = "git mv \"$path$fn\" \"$path$fn.$ext\"";
    print $mv, "\n";
    system($mv) == 0 or die "$?";
}

I had an issue with the bash script but it was related to my page names in roam (I think I had colons or quotes which caused the regexp to fail). Anyway I should be able to get this to work, thanks!

Does anyone know if Roam has changed the way things work here? With this script, I’m seeing:

Resolving 0 (0)... 0.0.0.0
Connecting to 0 (0)|0.0.0.0|:80... failed: Connection refused.

And this makes sense, I think because the my firebase links are landing on a page with this:

{
  "error": {
    "code": 400,
    "message": "Invalid HTTP method/URL pair."
  }
}

I thought maybe they made it so you needed to be logged into Roam before the links would work, but not so much. Anyone know what’s going on here? I’d love to get moved over to Obsidian completely.

Very possibly. I haven’t run this script in nearly a year and all my previous content on there is purged.

I tried Nicoles approach: Downloading files uploaded to Roam Research for use with Obsidian | Nicole van der Hoeven

It seems to have worked for me.

1 Like

That is also how I ended up getting it and subsequently forgetting to report back here. :grimacing:

Thanks @pmbauer and @DorianS!

Nicole’s approach worked perfectly for me on a Mac. Just set the path to your root directory and run the script. Less hassle than installing the userland stuff for the other approach.