Python script to scrape articles and add links to topics

Hi everyone,

inspired by this post: https://forum.obsidian.md/t/clip-an-article-straight-to-your-obsidian-with-one-click/ I decided to write my own little script.
You can find it here: https://github.com/Nebucatnetzer/url2markdown

Right from the start, it is still in a very early stage but works so far that one can see what I’m trying to achieve.
The main motivation behind this script was that I wanted a way to archive articles after I’ve read them to possibly reference them again at a later point.
Currently I’m doing this with Wallabag. Wallabag is a great application but a bit clunky and saves the articles into a DB instead in file. However it has a mobile application which is great.

What I try to achieve with this project:

  • Have an extension in the desktop browser to download the article with one click and extend it with the required topics.
  • Collect URLs on the go and possibly link to related topics.

So far the script I created does the following:

  1. Download the content from an article’s URL
  2. Convert it to markdown.
  3. If provided it adds topics to the header in the form of Obsidian’s wiki style links
  4. It can batch download all articles in a given file and add the related topic links to the header. With this I can add my read articles to a note on my phone, add the related topics and later download them from my computer or find a way to automate the download.

What is missing:

  1. Better scraping, currently there is still too much JavaScript and other stuff inside the Markdown.
  2. An easier way to configure the application at the moment there is no really a way to configure it, everything is hard-coded.
  3. Download the article’s images and save them to a related folder or similar.
  4. Packaging it to pypi.
  5. Make it work with the “External Application Button” extension (this is already halfway there).

I’m just leaving this here in case someone wants to test it and provide feedback. Please note that I might not be able to include all ideas and wishes since this is only a little fun project.

5 Likes