Get all uploaded files in Roam Research to Obsidian

Hallo, I just moved from Roam Research to Obsidian. I was able to convert the daily notes date formats and the date references using a python script from youtube.

Now I would like to download all the uploaded files (Roam Research uses firebase storage) to obsidian before I cancel my rs subcription.
I couldn’t find any script so far. Has anyone done this already?

You can try these two commands or modify them to your use (run from your vault root).

I recommend making a backup first or managing the vault with git.

First, download a markdown export of all your notes and extract the zip file. The commands below can be run in that extracted folder.

Download all images off firebase and store them in _attachment/img.

mkdir -p _attachment/img && \
  sed -n 's/.*\[](\(https:\/\/firebasestorage.*\)).*/\1/gp' *.md */*.md \
    | sort | uniq \
    | sed 's/.*%2F\(.*\)?alt.*/\1\t\0/' \
    | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

Then update all the links.

sed -i 's/!\[](https:\/\/firebasestorage.*%2F\(.*\)?alt.*)/![[_attachment\/img\/\1]]/g' *.md */*.md

Note for Mac Users

If you are trying this on a Mac, you will need to install and use gsed/gawk, a GNU userland, and a non-ancient version of bash.

brew install gnu-sed gawk gnutls bash # install gnu userland and non-ancient bash
/usr/local/bin/bash # run modern bash
export PATH=/usr/local/opt/coreutils/libexec/gnubin:$PATH
export PATH=/usr/local/opt/gnu-sed/libexec/gnubin:$PATH

# now you should be able to run the scripts above
2 Likes

UPDATE: 2021-08-05
These instructions to batch-delete uploaded firebase attachments no longer works. Also,
the roam forum was decommissioned.

After downloading all my uploads and converting them to local assets, I wasn’t comfortable leaving all of them in firebase. Roam has no provision for upload management, but if you can use Chrome/Firefox dev tools, this may work for you

https://forum.roamresearch.com/t/delete-files-uploaded-to-firebase/3811

@mikeriss in case any of the answers above solves your question, could you mark it as solved? Thanks!

1 Like

OK, I’ve done the date changes now I’m, onto this one!!

How do I actually do this?

  1. I presume I navigate to the root of the vault?
  2. Do I then run each line as a command or as a .sh file?

Really sorry to keep hassling, but I’m not sure how to get started with this…

Thanks again!

Jim

They actually only gave you two commands. The first one is very long, so they split it up over multiple lines using backslashes, which tells the shell “don’t run it yet when I press Enter, I’m going to continue typing the command on the next line.” It would work as a script, or as a multiple-line command, or you could run it as a single-line command if you prefer by removing the trailing backslashes (but keep the vertical pipe characters, they’re important).

no worries at all, exactly what @blippybloopy said. these are just two commands. you can select the entire command, copy and paste them into your terminal. remember to back everything up first.

@pmbauer thanks for your work on this! Which shell did you write that block (your download ‘script’) in/for? Blindly pasted into zsh (the macOS default shell), it doesn’t parse correctly. It also doesn’t seem to work in bash, sh, or tcsh? (That’s after installing gsed from MacPorts and using it instead of BSD sed).

Edit: I ended up using:

mkdir -p _attachment/img

gsed -n 's/.*\[](\(https:\/\/firebasestorage.*\)).*/\1/gp' *.md */*.md | sort | uniq | gsed 's/.*%2F\(.*\)?alt.*/\1\t\0/' | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

Where gsed is GNU Sed, installed from MacPorts. I think there’s at least one spare | in the original script causing it to fail at the start but it took some playing with it and compressing to one line before zsh was happy.

Edit 2: May also want to do this for Roam PDF embeds.

gsed -n 's/.*pdf.*\(https:\/\/firebasestorage.*\).*/\1/gp' *.md */*.md | sort | uniq | gsed 's/.*%2F\(.*\)?alt.*/\1\t\0/' | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

gsed -i 's/{{pdf:.*https:\/\/firebasestorage.*%2F\(.*\)?alt.*/![[_attachment\/img\/\1]]/g' *.md */*.md

I definitely suggest using something like a find all in a text editor you’re familiar with, particularly if you’re going to be deleting the files from firebase after (you don’t want to leave anything behind). Just search for firebasestorage

it’s bash; as noted in the instructions, you’ll need a gnu userland

Mac’s default version of bash is pretty ancient. If you can get it to work on bash 3.2.57 (mac’s default) great. I’ve never tried.

I added an extra section for mac users. The script doesn’t have an extra | and should just work on linux. You will likely need to tweak some things for your mac

1 Like

Oh, that figures. I didn’t realize that was also horribly out of date. Cheers!

This looks like greek to me… Anyone can share how to do this in Windows?

you have an extra pipe in there! you can’t pipe mkdir to sed, you already had the && to connect them.

it also didn’t let me run that pipeline (on linux) until I stripped the comments out (and the extra whitespace after the backslash).

mkdir -p _attachment/img && \
  sed -n 's/.*\[](\(https:\/\/firebasestorage.*\)).*/\1/gp' *.md */*.md \
  | sort | uniq \
  | sed 's/.*%2F\(.*\)?alt.*/\1\t\0/' \
  | awk '{system("wget -O _attachment/img/" $1 " " $2) }'

congrats, you found a typo, here’s a star :star:

thanks!

for some reason, I had a bunch of image files stored without a filename extension. this confused obsidian on my mac, and it didn’t end up syncing those files up. so of course I wrote some scripts to rename them all, to add the extension, and update references to them. after that, they synced.

remember to back up first (or use git)

first, I needed a list of them all (start this in the vault folder) … I looked at the files first to make sure they’re all .png or .jpg files in reality.

cd _attachment/img
ls |grep -v '\.' | xargs file --extension grep -Ev 'png|jpeg'

if that gives any output, you have some additional filetypes I didn’t, and you’ll have to adjust the perl script accordingly.

next I saved the list of filenames:

ls |grep -v '\.' > /tmp/list.txt
cd ../..

then I wrote a perl script to rename the file (using git mv ... because I have the vault tracked in git; if you don’t, take git out of the command in the script)

here’s how to run the script from the top folder of the vault (I called it ext.pl)

ext.pl < /tmp/list.txt

it took 3 minutes to run on my m1 mac, because I have over 7,000 files, and for every every one of the attachments without an extension, it checks every single file in the vault.

here’s the text of the perl script:

#!/usr/bin/perl -w

# takes an image file with no extension, runs it through file, then assumes it will be jpeg or png, updates all references
my $path = './_attachment/img/';
while ( $line = <> )
{
    chomp $line;
    my $fn = $line;
    chomp $fn;
    my $file = `file --extension $path$fn`;
    chomp $file;
    print "$file\n";
    $file =~ m/$fn: (png|jpeg)/;
    my $ext = $1;

    my $cmd = "find . -name \\*.md -print0 | xargs -0 perl -p -i -e \"s/$fn/$fn.$ext/g\"";
    print $cmd, "\n";
    system($cmd) == 0 or die "$?";
    my $mv = "git mv \"$path$fn\" \"$path$fn.$ext\"";
    print $mv, "\n";
    system($mv) == 0 or die "$?";
}

I had an issue with the bash script but it was related to my page names in roam (I think I had colons or quotes which caused the regexp to fail). Anyway I should be able to get this to work, thanks!

Does anyone know if Roam has changed the way things work here? With this script, I’m seeing:

Resolving 0 (0)... 0.0.0.0
Connecting to 0 (0)|0.0.0.0|:80... failed: Connection refused.

And this makes sense, I think because the my firebase links are landing on a page with this:

{
  "error": {
    "code": 400,
    "message": "Invalid HTTP method/URL pair."
  }
}

I thought maybe they made it so you needed to be logged into Roam before the links would work, but not so much. Anyone know what’s going on here? I’d love to get moved over to Obsidian completely.

Very possibly. I haven’t run this script in nearly a year and all my previous content on there is purged.