🐍 obsidiantools: Python package for doing advanced analytics of your vault

obsidiantools

:wave: Hi all, I’ve released a Python package called obsidiantools, available now through PyPI.

obsidiantools is built upon common libraries in the Python data stack, like Pandas and NetworkX, to enable you to do advanced analytics of your vault and notes.

Check out the Github repo:

I am not a software engineer who does JavaScript or TypeScript - if someone wants to collab on integrating this package into an Obsidian plugin I’d be interested in exploring that!

:mechanical_arm: Getting started

In your Python environment, run pip install obsidiantools.

Check out the README for more detail on API usage. It’s incredibly simple.

:sunglasses: Demo

See the functionality through a virtual Binder environment here:

Binder

:bulb: Key features

This is how obsidiantools can complement your workflows for note-taking:

  • Access a networkx graph of your vault (vault.graph)
  • Get summary stats about your notes, e.g. number of backlinks and wikilinks, in a Pandas dataframe
    • Get the dataframe via vault.get_note_metadata()
  • Retrieve detail about your notes’ links as built-in Python types
    • The various types of links:
      • Wikilinks (incl. header links, links with alt text)
      • Backlinks
    • You can access all the links in one place, or you can load them for an individual note:
      • e.g. vault.backlinks_index for all backlinks in the vault
      • e.g. vault.get_backlinks(<NOTE>) for the backlinks of an individual note
    • Check which notes are isolated (vault.isolated_notes)
    • Check which notes do not exist as files yet (vault.nonexistent_notes)

Using obsidiantools in your note-taking workflow

Through obsidiantools you can recreate your vault’s graph via NetworkX and I’ve shown a basic graph in the demo. NetworkX graphs are no replacement for the Obsidian app’s graphing capabilities! For a start, the app is interactive and NetworkX has limited customisation options.

Where NetworkX graphs really complement workflows are through:

  • Having a graph structured in Python’s most popular network analysis graph.
  • Processing the graph into a summary Pandas dataframe.
  • Being able to use the advanced capabilities of the library, e.g. doing deep dive analysis on subgraphs of your vault, applying algorithms to your vault (such as PageRank).

Especially in large vaults, these capabilities can enable you to narrow down which notes you want to focus on.

:framed_picture: Images

A basic chart I made with Matplotlib to mirror the Obsidian vault graph (‘nonexistent’ notes are greyed out):

As you can see from the chart there’s one note that is isolated:
isolated-notes

dataframe df sorted by number of backlinks:

PageRank values of notes (via NetworkX graph):
notes-pagerank

:building_construction: Future development

I will use the Github repo for the project development. I won’t have much time for the rest of the year for development but welcome ideas, pull requests etc.

I have tested the functionality on my own vaults of up to 100 notes but interested to see in how other people’s vaults fare with the package.

The key things I’d like to do in future are:

  • More functionality for Zettelkasten vault formats.
    • e.g. neat way to parse any timestamps from note filenames and integrate in the dataframe.
  • More metadata columns in dataframe:
    • e.g. created time, could be useful here. Not supported for Linux though (need to think about design, tests etc.).
    • metadata on the text e.g. word counts.
  • Bring the md file readers into the main API.
    • I’m not sure how the parsing works for non-Latin text currently and whether more config is needed. If the parsing works well e.g. for Chinese then it would be great to bring it in.
  • Bring markdown link counts into the API.
  • Expand documentation for network analysis, e.g. code snippets on different centrality measures. Expand to include basic NLP analysis if supporting md files in the main API.
7 Likes

:eyes:         

This looks interesting. I’ll take a look when I’ve got some time.

Already working on next release v0.6 - see Projects page for the progress.

Hope to release new version in next few weeks. :slightly_smiling_face:

New features are in the dev branch. The main addition coded up is support for front matter.

What I find interesting about front matter is that people seem to be using it in so many ways (if used at all). I think the package has to keep fairly simple in functionality as a result, and more of the work over time will become about showing recipes that people can do using the ‘vault object’ through libraries like Pandas, NetworkX, Matplotlib etc.

:rocket: Hi there, I have done a new release v0.6 with these extra features:

  • Support for front matter
  • Support for embedded files
  • More columns in metadata dataframe
  • get_md_links in main API

I’ve also updated my demo repo on Github to showcase the new functionality.

The more obvious next ideas for v0.7 will involve bringing the plaintext of notes into the API, e.g. ability to extract tags.

I’m currently studying and doing all my notes in an Obsidian vault, so that is also a cool personal test case for new features. I have been using embedded images a lot so that was the inspiration for that new functionality and getting this new release done. :sunglasses: