My Migration from Roam to Obsidian

For the latest instructions see the official Roam Research import guide


My primary reason for migrating was obsidian’s native iOS clients, and the corresponding performance improvement. Turns out Obsidian is much faster in practice. Plus, I needed to use a local-only graph for work, and it felt better to actually just have files stored on the disk instead of in browser local storage. I also thought I could take advantage of other tools if the notes were files on my disk.
Once I started the migration, I found I had around 6,200 files and between 100 and 200 folders.

Initial export and import

First, export your graph as json. (Use the “Export All” option from the “dot dot dot” menu at the upper right in Roam).

Next, I used this excellent python script I found on this forum:

One caveat: on my graph that was an export of my old Day One journal, there was one page which had a list of all the entries in the journal. Somehow that got truncated just under 2,000 lines, I was missing the last few hundred. Not sure what happened with that data … it might’ve been the Roam json export too. I haven’t noted that on any other pages. my next longest note was 774 lines.
here’s a step-by-step of what I did on my mac to run the script:

git clone https://github.com/renerocksai/rj2obs.git
cd rj2obs
pip3 instatll -r requirements.txt
cd ../json
unzip ../Roam-Export-1628036047152.zip
cd ..
mkdir md
cd md
python3 ../rj2obs/r2o.py ../json/Amazon.json

if the conversion fails on timestamps like mine did, I found some of the timestamps were specified in milliseconds instead of the standard seconds, so I ran this perl snippet on the json file and then ran rj2obs again:

perl -p -i -e 's/:(1[56]\d{8})\d{3},/:$1,/g' Paradigm.json

Next, open the vault in obsidian and run the obsidian markdown importer.
After that, I commited my files to git (you can skip this if you want, but I’m git obsessed so it feels nice to have a history and a semi-backup of my working files… I’m also subscribed to the Obsidian sync service now, but I like git as well).

git init && git add . && git commit -m initial

I found some of my files had 4 spaces for indentation instead of tabs, so I used perl to convert them. I also was using a [[2021]] page and sometimes called it #2021, but for some reason rj2obs missed it, so I converted that with perl too.

find . -name \*.md -print0 |xargs -0 perl -p -i -e 's/	/\t/g'
find . -name \*.md -print0 |xargs -0 perl -p -i -e 's/\#2021/[[2021]]/'

Additional Cleanup

One thing I noticed is that a markdown header breaks the outline flow, so after each header a new outline starts. Any time you start another outline, like after a heading, you need to have the first bullet point start all the way to the left. I have many pages where I have to use the vim command < followed by cursor movement (in command mode) to outdent a whole section between headers. I tend to just fix those blocks when I come across them naturally.

Page name problems

Compound Names

compound page names (where a page name is embedded in another page name) don’t seem to work in Obsidian. while [[words about [[a]]]] is bad, [[[[words]] about [[a]]]] is even worse. I had 120 or so of the first, single embedded name example, and a handful (less than 10) of the second, multiple embedded name example.
First I manually updated multiple embedded name files, one at a time, by renaming them in Obsidian, then running this perl to update all references:

find . -name \*.md -print0 |xargs -0 perl -p -i -e 's/\[\[words\]\] about \[\[a\]\]/words about a/'`

I found some where I wanted to move the new file to a folder, so I had to use a different regex for those cases:

find . -name \*.md -print0 |xargs -0 perl -p -i -e 's/\[\[page\]\] word/folder\/word\/page/'`

after I handled the really hard ones manually, I was left with the 120 single-embedded name pages … so I wrote a perl script to do them all at once. first I generate a list of them all:

find . -type f | grep '\[' > /tmp/filelist.txt

Then I could run the perl script on that list, in the top folder of the vault:

brack.pl < /tmp/filelist.txt

here is the text of the script:

#!/usr/bin/perl -w

my $done = 0;
while ( $line = <> )
{
    chomp $line;
    my $fn = $line;
    $line =~ s/\.md$//;
    my $escaped = $line;
    $escaped =~ s/\[/\\[/g;
    $escaped =~ s/\]/\\]/g;
    my $stripped = $line;
    $stripped =~ s/\[//g;
    $stripped =~ s/\]//g;

    my $cmd = "find . -name \\*.md -print0 | xargs -0 perl -p -i -e \"s/$escaped/$stripped/g\"";
    print $cmd, "\n";
    system($cmd) == 0 or die "$?";
    my $mv = "git mv \"$fn\" \"$stripped.md\"";
    print $mv, "\n";
    system($mv) == 0 or die "$?";
}

Invalid page names

no : or # signs are allowed, and > are allowed, but block refs don’t seem to work if you put them at the end of the filename (>]]) … I had to search for all those files, using find and grep, then I’d change the filename in obsidian, and run one of the shell lines above to convert all references in all the other files. Also, I had to put double quotes around the old title in the frontmatter the script had written at the top of each page.

Ambiguous Names and Namespaces

At first, I wasn’t sure what to do about ambiguous page names (i.e., how to handle namespaces). In Roam, you simply prepended an ambigous term with a folder name. So like, for a trivial example, when considering a music record and a database record, you might use musc/record and/or database/record.
One of the things which slowed me down was I didn’t understand how Obsidian handles folders. if you refer to [[music/record]] but there is no top-level record.md, Obsidian displays it as [[record]] since there’s not really an ambiguity.
As a follower of PARA, I was having trouble keeping all the pages related to a particular project linked together in Roam. Yes, everything would show up in the linked references, but it’s actually easier in this case to create a folder hierarchy. So now I’ve ended up with a folder structure like Projects/Project A/ and then under each project folder, I tend to have folders like Meeting, slack, ticket, email, articles, then I have both the page links and the folder structure to reach all of a project’s assets. also makes it easy to move the whole project to the Archives when it’s done.
In conclusion, I find that I tend to use folders more for grouping things rather than disambiguating names.
Another tip: if you use Obsidian to move a file, it will update all the references to the short name (without the folders) as long as it’s not ambiguous.

2 Likes

here’s the original forum post about the python script I used

1 Like