Text Embeddings

Vector text embeddings would allow for semantic search and many other powerful features. I think it would be fun to open up a discussion about what powerful note-taking workflows could be unlocked in Obsidian by using text embeddings.

  • Imagine if we had a graph view that visualized notes by embedding distance
  • Imagine a chrome extension that surfaced the closest notes to any web-page you are on, by embedding vector distance
  • Imagine if when sharing an item to obsidian, it suggested potential destination notes based on an embedding search
  • Imaging adding semantic search; so you could describe what you want to find, and the meaning behind your words would be captured even if the exact words used in the note were different.

What are other ideas for how recent advances in Large Language Models might unlock new workflows in PKM and second-brain development / organization?

3 Likes

Had the some idea- let’s talk! I’m exploring this idea right now and want to create a prototype.

Sure; I’m down to chat: Calendly - Zach Doty

Interested in building this out, throw me in the loop! Discord djmango#8778

Love this idea! Semantic embedding has also been proposed at Conceptarium - Paul Bricman

+1, think this is a very fun idea. I saw a Tiktok about this which led me to this GitHub repo. I played around with OpenAI and uploaded a bunch of my embeddings to Pinecone’s API. I didn’t really put too much thought into it but I think this has a lot of potential as I was able to perform a decent fuzzy search. That being said, the tokenization, lemmatization, normalization, and organizing of my Obsidian notes was the limiting factor.

I noticed other people did the same thing as sidcodes, particularly here along with a local, non-third-party solution here (doing the sidgrep method requires uploading your data to both OpenAI and Pinecone, which some may not like).

More stuff I found:

- https://louis030195.medium.com/search-paul-graham-essays-with-siri-building-an-embedding-powered-product-in-few-lines-of-code-c578b43d741
- https://louis030195.medium.com/fine-tuning-openai-api-gpt3-on-your-second-brain-obsidian-b082afaaeba7
- https://github.com/topics/semantic-search?l=python&o=desc&s=forks

Would love to see more of this come by.