Clip an article straight to your Obsidian with one click

input_sh · July 1, 2020, 12:20pm

I’ve checked out solutions available here, but the ones I’ve tried require too much friction:

click on the button,
copy the text,
create a new file manually,
paste the text.

Some of them do download the actual file, but they place it in ~/Downloads and you have to manually move them over.

My solution:

click on the button in your browser.

That’s it! It automatically creates the file, adds some metadata at the top of the Markdown file, and you can see it straight from your Obsidian.

Example

I went to the random article on my Medium homepage: https://medium.com/@lancengym/the-endgame-for-linkedin-is-coming-31d4a8b2a76
I clicked on the button, which created a file named 20200701-the_endgame_for_linkedin_is_coming.md whose top looks like this:

# The Endgame for LinkedIn Is Coming

* **Source:** [medium.com](https://medium.com/@lancengym/the-endgame-for-linkedin-is-coming-31d4a8b2a76)
* **Author:** Lance Ng
* **Word count:** 1844
* **Extracted at:** 2020-07-01 14:03

![lead image](https://miro.medium.com/max/1200/1*F13E-o2ErInkVjeIFGw9rA.png)

After two years, Microsoft still hasn’t delivered on its grand vision for LinkedIn. And it may never do so. [...]

It saves the full content, adds some metadata, and loads images externally from the same location as the article.

How to set it up

Make sure you have python3 installed, though it should be available out of the box in most Linux distributions.
Install mercury-parser somewhere in your path: npm -g install @postlight/mercury-parser (yarn also works).
Modify the 9th line in the script below to point to the directory where you will store the links. Use absolute path instead of relative (so /home/input_sh/Notes instead of ~/Notes).
Install External Application Button in your preferred browser.
Fill out its preferences like this:

Screenshot from 2020-07-01 14-05-41@2x1153×879 39.6 KB
There’s also a “Surround arguments with quote characters” option below what’s visible in the screenshot, which would be nice to tick. Optionally, you can also upload a custom icon to be shown in your browser’s interface (32x32), as well as to close the tab automatically when the button in clicked.

Script

#!/usr/bin/python3
# -*- coding: utf-8 -*-

import os, sys, json
import datetime

link = str(sys.argv[1])
print("Processing " + link)
directory = "/home/input_sh/Notes/Links/"

resp = json.loads(os.popen("mercury-parser " + link + " --format=markdown").read())
today = datetime.datetime.now()

out_content = resp["content"]
out_title   = resp["title"]
out_url     = resp["url"]
out_domain  = resp["domain"]
out_wc      = resp["word_count"]
if resp["author"]:
    out_author = resp["author"]
else:
    out_author = "Unknown"

# Ewww, www
if "www." in out_domain:
    out_domain = out_domain.replace("www.", "")

if resp["lead_image_url"]:
    out_lead_img = resp["lead_image_url"]
    header = "* **Source:** [" + out_domain + "](" + out_url + ")\n* **Author:** " + out_author + "\n* **Word count:** " + str(out_wc) + "\n* **Extracted at:** " + today.strftime("%Y-%m-%d %H:%M") + "\n\n"
    content = "# " + out_title + "\n\n" + header + "![lead image](" + out_lead_img + ")\n\n" + out_content
else:
    header = "* **Source:** [" + out_domain + "](" + out_url + ")\n* **Author:** " + out_author + "\n* **Word count:** " + str(out_wc) + "\n* **Extracted at:** " + today.strftime("%Y-%m-%d %H:%M") + "\n\n---\n\n"
    content = "# " + out_title + "\n\n" + header + out_content

# Formats the title of the file
title = today.strftime("%Y%m%d-" + out_title)
title = title.lower().replace(" ", "_").replace("(", "").replace(")", "")

# Writes to the actual file
f = open(os.path.join(directory + title + ".md"), "w+", encoding="utf-8")
f.write(content)
f.close()

print("Done!")

Don’t forget to make the script executable! If you’ve saved the script as web-clipper.py, you’d do something like chmod +x web-clipper.py. You can test if it works from your terminal by running ./web-clipper.py "https://medium.com/@lancengym/the-endgame-for-linkedin-is-coming-31d4a8b2a76" before you proceed with setting up External Application Button.

Feel free to let me know if you encounter some quirks.

glpelletier · July 2, 2020, 3:41am

Howdy,
Great potential here, I am getting the One Last Step page with a successful check of the connection and the correct path to the folder I want the links in:

D:\Files\Vault\ObsidianVault\02 - Links

Obviously I am doing something wrong.

Guy Pelletier

Philipp · July 2, 2020, 1:34pm

@input_sh
Thanks for that script! I was thiking about a better way to clip articles and you just found the perfect soluton for me

@glpelletier

Make sure you don’t use spaces in the path, that will result in errors.

Before you use the browser extensions test/debug the script one time in the terminal to make sure it works correctly with ./path/to/web-clipper.py

There might be some caveats:

The script is not executable by default (on linux just make it executable with chmod a+x web-clipper.py)
Ther mercury parser is not installed correctly, it might help to install it globally with yarn global add @postlight/mercury-parser (see documentation)

Sorry I can’t give exact instructions for windows but this might be a start.

glpelletier · July 2, 2020, 8:20pm

Thanks for the reply, I will give it a shot and let you know how it goes.

Guy

danbburg · July 2, 2020, 10:07pm

If you get an error bad interpreter then you can change the shebang line (first in python script) to #!/usr/bin/env python3

if you still get that error you need to make sure python 3 is installed

wodecki · July 17, 2020, 10:34am

Hi,

many thanks

I have a strange problem…

Direct call: ./web-clipper.py “https://medium.com/@lancengym/the-endgame-for-linkedin-is-coming-31d4a8b2a76” works perfect
The same call from External Application Button displays the error:
Error: /bin/sh: mercury-parser: command not found
So, it calls web-clipper.py correctly, but it looks like the python script can’t call mercury-parser.

I’ve configured External Application Button as you suggested.

Any idea?

Many thanks in advance, Your script is a really killing feature to me…

Yours,

Andy

varian93 · October 25, 2020, 1:05pm

Rather than loading the images externally, it would be great if there was the option to clip the images locally as well.

I realize that it can be done manually.

Clipping text only from a web site is essentially 50% a bookmark as you still need access to their web server to load the images.

I prefer to keep the content of everything I clip locally. It wouldn’t do me much good if the pictures contain some information necessary for the text to make sense & I either don’t have an internet connection, the site was taken down, the article was removed, etc.

Zacks · June 26, 2021, 9:34am

Hi,

Did you find a solution to this problem?
I’m having the same issue, and can’t find a way to solve it.

wodecki · June 28, 2021, 6:32am

No. I gave up. Andy

AND · July 26, 2021, 1:02pm

Great job, I’d definitely try it.

Is there a way to extract images and other media together with the article, so that I don’t need a connection to access them?