On finding out what to do with a digital diary

So I have been writing a digital diary for many years now, each day is a new text file.

Is there a way to extract information from that, how could I use my dataset, share your ideas.

I guess that would depend on what kind of information you would want to extract, and how you put that kind of information into your diary. There is a lot of examples on this forum for extracting information using dataview, but you need specific fields for that in your entries.

This is what I am asking.

What I would like:

  • What do I need right now? Topics that are surrounded by a lot of emotional words.
  • What could I write about in a blog post? Topics that are dealt with the most.
  • Which topics have I never adressed that are worth exploring?

And by topics I am not talking about word with a hashtag or the title of the file that is the biggest, but a broader view, commonality.

Ie I write

I enjoy fishing


I need to go to the lake soon

=> Topic/Commonality/Meta: Activities around water

What kind of information do you come up with that I could extract from a diary - and how?

What kind of information do you come up with that I could extract from a diary - and how?

What you are talking about is NLP, natural language processing. Obsidian is not the place to try to do this. Feel free if you want to try to do searches on key words and see what notes have similarities, but I don’t think there are any NLP tools for Obsidian yet.

I’m in no way an NLP expert, but there are a ton of applications to process text and cull information. Its a huge deep learning topic. Here is a link to get you started.


If you look for a commonality between notes you may want to try extracting so called nearest neighbors for a given note basing on its contents.

I’ve been playing with exactly that lately. Here is the script I use in my research: GitHub - rpast/obsidian_vault_neighbors: Semantic connections recommendation script for Obsidian.md notes.

It is fairly straightforward NLP technique based on token frequency but gives surprisingly accurate semantic result.

1 Like