🐍 obsidiantools: Python package for doing advanced analytics of your vault

markf · September 13, 2021, 8:31pm

obsidiantools

Hi all, I’ve released a Python package called obsidiantools, available now through PyPI.

obsidiantools is built upon common libraries in the Python data stack, like Pandas and NetworkX, to enable you to do advanced analytics of your vault and notes.

Check out the Github repo:

I am not a software engineer who does JavaScript or TypeScript - if someone wants to collab on integrating this package into an Obsidian plugin I’d be interested in exploring that!

Getting started

In your Python environment, run pip install obsidiantools.

Check out the README for more detail on API usage. It’s incredibly simple.

Demo

See the functionality through a virtual Binder environment here:

Key features

This is how obsidiantools can complement your workflows for note-taking:

Access a networkx graph of your vault (vault.graph)
Get summary stats about your notes, e.g. number of backlinks and wikilinks, in a Pandas dataframe
- Get the dataframe via vault.get_note_metadata()
Retrieve detail about your notes’ links as built-in Python types
- The various types of links:
  - Wikilinks (incl. header links, links with alt text)
  - Backlinks
- You can access all the links in one place, or you can load them for an individual note:
  - e.g. vault.backlinks_index for all backlinks in the vault
  - e.g. vault.get_backlinks(<NOTE>) for the backlinks of an individual note
- Check which notes are isolated (vault.isolated_notes)
- Check which notes do not exist as files yet (vault.nonexistent_notes)

Using obsidiantools in your note-taking workflow

Through obsidiantools you can recreate your vault’s graph via NetworkX and I’ve shown a basic graph in the demo. NetworkX graphs are no replacement for the Obsidian app’s graphing capabilities! For a start, the app is interactive and NetworkX has limited customisation options.

Where NetworkX graphs really complement workflows are through:

Having a graph structured in Python’s most popular network analysis graph.
Processing the graph into a summary Pandas dataframe.
Being able to use the advanced capabilities of the library, e.g. doing deep dive analysis on subgraphs of your vault, applying algorithms to your vault (such as PageRank).

Especially in large vaults, these capabilities can enable you to narrow down which notes you want to focus on.

Images

A basic chart I made with Matplotlib to mirror the Obsidian vault graph (‘nonexistent’ notes are greyed out):

As you can see from the chart there’s one note that is isolated:
isolated-notes

dataframe df sorted by number of backlinks:

PageRank values of notes (via NetworkX graph):
notes-pagerank

Future development

I will use the Github repo for the project development. I won’t have much time for the rest of the year for development but welcome ideas, pull requests etc.

I have tested the functionality on my own vaults of up to 100 notes but interested to see in how other people’s vaults fare with the package.

The key things I’d like to do in future are:

More functionality for Zettelkasten vault formats.
- e.g. neat way to parse any timestamps from note filenames and integrate in the dataframe.
More metadata columns in dataframe:
- e.g. created time, could be useful here. Not supported for Linux though (need to think about design, tests etc.).
- metadata on the text e.g. word counts.
Bring the md file readers into the main API.
- I’m not sure how the parsing works for non-Latin text currently and whether more config is needed. If the parsing works well e.g. for Chinese then it would be great to bring it in.
Bring markdown link counts into the API.
Expand documentation for network analysis, e.g. code snippets on different centrality measures. Expand to include basic NLP analysis if supporting md files in the main API.

tallguyjenks · September 14, 2021, 1:19am

Tomodachi94 · September 14, 2021, 1:45am

This looks interesting. I’ll take a look when I’ve got some time.

markf · September 20, 2021, 7:30pm

Already working on next release v0.6 - see Projects page for the progress.

Hope to release new version in next few weeks.

New features are in the dev branch. The main addition coded up is support for front matter.

What I find interesting about front matter is that people seem to be using it in so many ways (if used at all). I think the package has to keep fairly simple in functionality as a result, and more of the work over time will become about showing recipes that people can do using the ‘vault object’ through libraries like Pandas, NetworkX, Matplotlib etc.

markf · October 19, 2021, 10:52pm

Hi there, I have done a new release v0.6 with these extra features:

Support for front matter
Support for embedded files
More columns in metadata dataframe
get_md_links in main API

I’ve also updated my demo repo on Github to showcase the new functionality.

The more obvious next ideas for v0.7 will involve bringing the plaintext of notes into the API, e.g. ability to extract tags.

I’m currently studying and doing all my notes in an Obsidian vault, so that is also a cool personal test case for new features. I have been using embedded images a lot so that was the inspiration for that new functionality and getting this new release done.

obsimion · February 11, 2022, 11:05pm

…awesome, thanks!

markf · August 7, 2022, 7:31pm

I’ve released v0.8 today (now v0.8.1 to fix a bug relating to a new dependency).

See the v0.8 release notes.

Summary:

Neat functions to get text in different forms (one with formatting preserved, another with it stripped out)
Markdown extensions included to replicate Obsidian experience
LaTeX equations easily accessible

I also didn’t add any info on v0.7 in this topic before, but that added in support for extracting tags from notes (and an earlier approach for extracting text).

v0.8 enables NLP analysis to be done much easier. I’ve done some prototyping on a concept that auto-generates markdown by analysing text data from a vault. Hopefully I’ll have something to share on that later this month!

louis030195 · August 16, 2022, 5:47pm

Awesome!

I’ve used this lib to build an API for semantic search and fine-tune an embedding model on your vault for Obsidian

louis030195 · August 16, 2022, 7:06pm

Also, I’m trying to play a bit with graph neural network
https://pytorch-geometric.readthedocs.io/en/latest/modules/utils.html#torch_geometric.utils.from_networkx

It would be fun to do a node classifier for example

markf · August 16, 2022, 8:18pm

@louis030195 it’s cool that you’re doing work with Transformers and Pytorch via the package! I’ve not done much with them so it’ll be interesting to see what can be done there. I was thinking about ngrams for more sophisticated search initially but of course those packages can do something more advanced now.

For my NLP work, I’ve focused on topic modelling via one semi-supervised algorithm so far… so I’ve looked at NLP problems from a different angle. I’ve used that algorithm do make proofs-of-concept for these features:

Auto-generated MOCs (see forum post)
Auto-suggested wikilinks*

They are in this obsidian-nlp-analytics repo.

* I’ve updated my repo now with this notebook.

louis030195 · September 21, 2022, 4:08pm

@markf
Awesome, If I find the time I’ll try your approaches

I also used your library to fine-tune GPT3 on my vault: Language model assistance - #5 by louis030195

markf · January 8, 2023, 2:42pm

Hi everyone, I’ve launched v0.10, which is an exciting release as I’ve added a lot of new features in the past month and I’d say the capabilities of the obsidiantools graph capture the current Obsidian v1 app well. The setup is twice as fast on my largest vault compared to v0.8 of the package.

All these files are supported in obsidiantools:

Notes (md files)
Media files (images, videos, etc.)
Canvas files

Canvas files

Support for canvas files should reflect the current state of the Obsidian app. For example, you can detail on the backlinks relating to canvas files. You can also recreate the layout of a canvas file in Python:
canvas-file-graph

I’ve added a notebook to my demo repo with info on the support for canvas files and show a comparison with the Obsidian app’s graph view.

Backlinks for notes that come from canvas files aren’t supported, reflecting the current status of the Obsidian app. Though I can see a route to supporting that in future.

Attachment files in graph

There is the option now to include ‘attachment’ files in the obsidiantools graph. This is an example of how my test vault is visualised via Pyvis:

I’ve added a notebook to my demo repo with more recipes for graph visualisation and show a comparison with the Obsidian app’s graph view.

grepinsight · February 28, 2023, 3:28am

This is awesome. There’s so much potential in this. Thank you for creating the package!

I’m curious if there are any plans to include the ability to mutate Markdown files, such as bulk adding tags to multiple files?

markf · March 4, 2023, 7:49pm

There are many great Obsidian plug-ins for editing & reworking notes, so I don’t plan to add any note-editing functionality in this package. This package is focused for now on making it easier to analyse the content in vaults.

nealr · April 21, 2025, 12:47am

The last Github update on this was three years ago and the last Issue cleared was 1/2023. Is development still active here? I’m interested in particular in the graph related capabilities.