šŸ obsidiantools: Python package for doing advanced analytics of your vault

obsidiantools

:wave: Hi all, Iā€™ve released a Python package called obsidiantools, available now through PyPI.

obsidiantools is built upon common libraries in the Python data stack, like Pandas and NetworkX, to enable you to do advanced analytics of your vault and notes.

Check out the Github repo:

I am not a software engineer who does JavaScript or TypeScript - if someone wants to collab on integrating this package into an Obsidian plugin Iā€™d be interested in exploring that!

:mechanical_arm: Getting started

In your Python environment, run pip install obsidiantools.

Check out the README for more detail on API usage. Itā€™s incredibly simple.

:sunglasses: Demo

See the functionality through a virtual Binder environment here:

Binder

:bulb: Key features

This is how obsidiantools can complement your workflows for note-taking:

  • Access a networkx graph of your vault (vault.graph)
  • Get summary stats about your notes, e.g. number of backlinks and wikilinks, in a Pandas dataframe
    • Get the dataframe via vault.get_note_metadata()
  • Retrieve detail about your notesā€™ links as built-in Python types
    • The various types of links:
      • Wikilinks (incl. header links, links with alt text)
      • Backlinks
    • You can access all the links in one place, or you can load them for an individual note:
      • e.g. vault.backlinks_index for all backlinks in the vault
      • e.g. vault.get_backlinks(<NOTE>) for the backlinks of an individual note
    • Check which notes are isolated (vault.isolated_notes)
    • Check which notes do not exist as files yet (vault.nonexistent_notes)

Using obsidiantools in your note-taking workflow

Through obsidiantools you can recreate your vaultā€™s graph via NetworkX and Iā€™ve shown a basic graph in the demo. NetworkX graphs are no replacement for the Obsidian appā€™s graphing capabilities! For a start, the app is interactive and NetworkX has limited customisation options.

Where NetworkX graphs really complement workflows are through:

  • Having a graph structured in Pythonā€™s most popular network analysis graph.
  • Processing the graph into a summary Pandas dataframe.
  • Being able to use the advanced capabilities of the library, e.g. doing deep dive analysis on subgraphs of your vault, applying algorithms to your vault (such as PageRank).

Especially in large vaults, these capabilities can enable you to narrow down which notes you want to focus on.

:framed_picture: Images

A basic chart I made with Matplotlib to mirror the Obsidian vault graph (ā€˜nonexistentā€™ notes are greyed out):

As you can see from the chart thereā€™s one note that is isolated:
isolated-notes

dataframe df sorted by number of backlinks:

PageRank values of notes (via NetworkX graph):
notes-pagerank

:building_construction: Future development

I will use the Github repo for the project development. I wonā€™t have much time for the rest of the year for development but welcome ideas, pull requests etc.

I have tested the functionality on my own vaults of up to 100 notes but interested to see in how other peopleā€™s vaults fare with the package.

The key things Iā€™d like to do in future are:

  • More functionality for Zettelkasten vault formats.
    • e.g. neat way to parse any timestamps from note filenames and integrate in the dataframe.
  • More metadata columns in dataframe:
    • e.g. created time, could be useful here. Not supported for Linux though (need to think about design, tests etc.).
    • metadata on the text e.g. word counts.
  • Bring the md file readers into the main API.
    • Iā€™m not sure how the parsing works for non-Latin text currently and whether more config is needed. If the parsing works well e.g. for Chinese then it would be great to bring it in.
  • Bring markdown link counts into the API.
  • Expand documentation for network analysis, e.g. code snippets on different centrality measures. Expand to include basic NLP analysis if supporting md files in the main API.
12 Likes

:eyes: Ā Ā Ā Ā Ā Ā Ā Ā 

This looks interesting. Iā€™ll take a look when Iā€™ve got some time.

Already working on next release v0.6 - see Projects page for the progress.

Hope to release new version in next few weeks. :slightly_smiling_face:

New features are in the dev branch. The main addition coded up is support for front matter.

What I find interesting about front matter is that people seem to be using it in so many ways (if used at all). I think the package has to keep fairly simple in functionality as a result, and more of the work over time will become about showing recipes that people can do using the ā€˜vault objectā€™ through libraries like Pandas, NetworkX, Matplotlib etc.

:rocket: Hi there, I have done a new release v0.6 with these extra features:

  • Support for front matter
  • Support for embedded files
  • More columns in metadata dataframe
  • get_md_links in main API

Iā€™ve also updated my demo repo on Github to showcase the new functionality.

The more obvious next ideas for v0.7 will involve bringing the plaintext of notes into the API, e.g. ability to extract tags.

Iā€™m currently studying and doing all my notes in an Obsidian vault, so that is also a cool personal test case for new features. I have been using embedded images a lot so that was the inspiration for that new functionality and getting this new release done. :sunglasses:

3 Likes

ā€¦awesome, thanks!

:rocket: Iā€™ve released v0.8 today (now v0.8.1 to fix a bug relating to a new dependency).

See the v0.8 release notes.

Summary:

  • Neat functions to get text in different forms (one with formatting preserved, another with it stripped out)
  • Markdown extensions included to replicate Obsidian experience
  • LaTeX equations easily accessible

I also didnā€™t add any info on v0.7 in this topic before, but that added in support for extracting tags from notes (and an earlier approach for extracting text).

v0.8 enables NLP analysis to be done much easier. Iā€™ve done some prototyping on a concept that auto-generates markdown by analysing text data from a vault. Hopefully Iā€™ll have something to share on that later this month!

Awesome!

Iā€™ve used this lib to build an API for semantic search and fine-tune an embedding model on your vault for Obsidian

1 Like

Also, Iā€™m trying to play a bit with graph neural network
https://pytorch-geometric.readthedocs.io/en/latest/modules/utils.html#torch_geometric.utils.from_networkx

It would be fun to do a node classifier for example

1 Like

@louis030195 itā€™s cool that youā€™re doing work with Transformers and Pytorch via the package! :sunglasses: Iā€™ve not done much with them so itā€™ll be interesting to see what can be done there. I was thinking about ngrams for more sophisticated search initially but of course those packages can do something more advanced now.

For my NLP work, Iā€™ve focused on topic modelling via one semi-supervised algorithm so farā€¦ so Iā€™ve looked at NLP problems from a different angle. Iā€™ve used that algorithm do make proofs-of-concept for these features:

  • Auto-generated MOCs (see :link: forum post)
  • Auto-suggested wikilinks*

They are in this :file_folder: obsidian-nlp-analytics repo.

* Iā€™ve updated my repo now with this notebook.

1 Like

@markf
Awesome, If I find the time Iā€™ll try your approaches

I also used your library to fine-tune GPT3 on my vault: Language model assistance - #5 by louis030195

1 Like

Hi everyone, Iā€™ve launched v0.10, which is an exciting release as Iā€™ve added a lot of new features in the past month and Iā€™d say the capabilities of the obsidiantools graph capture the current Obsidian v1 app well. The setup is twice as fast on my largest vault compared to v0.8 of the package.

All these files are supported in obsidiantools:

  • Notes (md files)
  • Media files (images, videos, etc.)
  • Canvas files

Canvas files

Support for canvas files should reflect the current state of the Obsidian app. For example, you can detail on the backlinks relating to canvas files. You can also recreate the layout of a canvas file in Python:
canvas-file-graph

Iā€™ve added a notebook to my demo repo with info on the support for canvas files and show a comparison with the Obsidian appā€™s graph view.

Backlinks for notes that come from canvas files arenā€™t supported, reflecting the current status of the Obsidian app. Though I can see a route to supporting that in future.

Attachment files in graph

There is the option now to include ā€˜attachmentā€™ files in the obsidiantools graph. This is an example of how my test vault is visualised via Pyvis:

Iā€™ve added a notebook to my demo repo with more recipes for graph visualisation and show a comparison with the Obsidian appā€™s graph view.

1 Like

This is awesome. Thereā€™s so much potential in this. Thank you for creating the package!

Iā€™m curious if there are any plans to include the ability to mutate Markdown files, such as bulk adding tags to multiple files?

There are many great Obsidian plug-ins for editing & reworking notes, so I donā€™t plan to add any note-editing functionality in this package. This package is focused for now on making it easier to analyse the content in vaults.