I started tinkering on this using
- GitHub - rom1504/clip-retrieval: Easily compute clip embeddings and build a clip retrieval system with them
- alternatively https://jina.ai
The code for the obsidian plugin is 80% done
(here is fake API)
My current advance regarding the semantic search part:
- some issues on building the dataset of caption <-> text, so it’s not taking the whole vault when indexing, i.e. search results not great yet