Khoj: An AI powered Search Assistant for you Second Brain

  • Overview: Khoj is a fast, private, AI powered search assistant for Obsidian

  • Background:

    • The Khoj Obsidian plugin just got added to the Community Plugins store. I wanted to share this with the Obsidian community for feedback and testing.
    • I’ve been (developing and) using Khoj for more than a year now. It’s fast and accurate enough that I now almost exclusively use this to search through my (120K+ lines of org-mode) notes. Hopefully some of you folks find it useful too :innocent:
  • Features: Use natural language to privately explore your markdown notes in Obsidian. Includes Incremental Search and Find Similar Notes capability

  • Quickstart

    1. Install & Start Khoj Backend: pip install khoj-assistant && khoj --no-gui
    2. Install the Khoj Obsidian Plugin in your Obsidian Vault
    3. Open Khoj: Search from the Obsidian Command Palette and type your query to search through markdown content in your Obsidian vault
  • Resources

7 Likes

Screenshot of Khoj Search Modal

After installing and re-installing, I keep getting this error, and I don’t know how to troubleshoot.

Traceback (most recent call last):
  File "/home/ryan/.local/bin/khoj", line 8, in <module>
    sys.exit(run())
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/main.py", line 80, in run
    configure_server(args, required=False)
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/configure.py", line 44, in configure_server
    state.model = configure_search(state.model, state.config, args.regenerate)
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/configure.py", line 82, in configure_search
    model.markdown_search = text_search.setup(
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/search_type/text_search.py", line 168, in setup
    extract_entries(config.compressed_jsonl) if config.compressed_jsonl.exists() and not regenerate else None
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/search_type/text_search.py", line 60, in extract_entries
    return list(map(Entry.from_dict, load_jsonl(jsonl_file)))
  File "/home/ryan/.local/lib/python3.10/site-packages/khoj/utils/jsonl.py", line 27, in load_jsonl
    for line in jsonl_file:
  File "/usr/lib/python3.10/gzip.py", line 314, in read1
    return self._buffer.read1(size)
  File "/usr/lib/python3.10/_compression.py", line 68, in readinto
    data = self.read(len(byte_view))
  File "/usr/lib/python3.10/gzip.py", line 496, in read
    uncompress = self._decompressor.decompress(buf, size)
zlib.error: Error -3 while decompressing data: invalid block type
System Details

Operating System: Kubuntu 22.10
KDE Plasma Version: 5.25.5
KDE Frameworks Version: 5.98.0
Qt Version: 5.15.6
Kernel Version: 5.19.0-31-generic (64-bit)
Graphics Platform: X11
Processors: 16 × 12th Gen Intel® Core™ i5-1240P
Memory: 7.5 GiB of RAM
Graphics Processor: Mesa Intel® Graphics
Manufacturer: Framework
Product Name: Laptop (12th Gen Intel Core)
System Version: A4

@hoperyto it would be good to add some specific details both here and on Github as to how this actually functions.

  1. What is Khoj backend? Is this your code, or is Khoj an existing AI/ML platform which you’ve integrated into Obsidian?
  2. Does this all run locally? Is your vault data sent to any third-party or external (non-localhost) service?

it would be good to add some specific details both here and on Github as to how Khoj actually functions.

Does the architecture diagram help clarify how Khoj actually functions?

What is Khoj backend? Is this your code, or is Khoj an existing AI/ML platform which you’ve integrated into Obsidian?

  • The Khoj backend is where most of the indexing, search logic is implemented. The Obsidian plugin provides a frontend to configure and use the Khoj backend
  • Yes, the Khoj project has been created by me
  • The backend/platform allows indexing multiple different content types (currently org-mode, markdown notes, beancount transactions and images) and exposes an API for the different frontends/interfaces (currently Obsidian, Emacs and Web) to interact with the application

Does this all run locally? Is your vault data sent to any third-party or external (non-localhost) service?

Yes, Khoj runs locally on your machine. Your vault data does not leave your machine, it is not sent to any third-party or external services. Only the AI models are downloaded from HuggingFace on first run.

You do have the option to use OpenAI models for search and the (currently beta) chat features. But this is something you’ll have to manually enable/configure. In such cases your note(s) will be sent to OpenAI for processing.

2 Likes

Hey @scwunch, I haven’t seen any folks hit this zlib decompression issue yet. Seems like your khoj notes index (which are stored in gzipped jsonl format) is corrupted somehow.

One thing to try is to delete your .khoj directory at ~/.khoj, restart khoj and reinstall/re-enable the khoj obsidian plugin. This will index your notes again and hopefully fix your issue.

Also it’d be great if you can open a new discussion on the Khoj Github to investigate this issue further.

That did the trick, thanks :slightly_smiling_face:

I might have been using different/older versions for the backend and the frontend, if that’s possible that corrupted the index. Do you still want me to open a discussion?

Hi, I got Khoj installed today and have been playing around. Are there any examples you have where you said “wow, khoj is perfect for this!” Or things you feel like it enables that wouldn’t otherwise be possible? Like what are the itches for you that this scratches?

Sorry, missed your message. No need to open a discussion if its fixed for you. (Such corruptions can randomly occur for e.g if app closed in middle of updating index)

Hey @SteveLambert, great that you got to play with Khoj!

Or things you feel like it enables that wouldn’t otherwise be possible? Like what are the itches for you that this scratches?

The specific itches that Khoj alleviates for me are:

  1. I know I’ve written about topic X somewhere in my notes but I can’t find it with my keyword based search tools. So I’ve to fallback to do a global search with Google instead
  2. After having researched on Y, finding out that I’d already collected notes on that topic
  3. Reducing the general cognitive load of searching as I can be lax with the words I use in my query, unlike in traditional keyword based search where I need to use the exact word used in a note or it wouldn’t show up in the results. E.g if I search for “Buy a car”, it’ll find notes that mention a Corolla (or Ferrari)
  4. Reducing the general cognitive load of note taking as I don’t need to worry about using the right tags, words in the note to be able to retrieve it later
  5. The speed to get relevant results opens up more uses for my notes to actually function as a (good) second brain

As a user I’ve observed myself default to using Khoj instead of keyword based search tools. I think because of the reduced cognitive load of searching it enables.

For context: I have a decade worth of notes in Emacs/Org-mode and my memory/recall capabilities are not that great.

2 Likes

Are there any examples you have where you said “wow, khoj is perfect for this!”

I haven’t (unfortunately) collected good examples (but I should). One concrete recent example I can recall: I was collecting advice for friends about to have their first child. I know I’ve notes on this stuff. With Khoj, I could just search for “advice for kids”, get notes on infant care, educational toys for children etc and use those notes as a jumping point to build from.

It makes these kind of processes easier, more enjoyable than having to start from scratch

I see. I will mess around with it again. I think because of the way I was approaching the search, I found the results “noisy” and there seemed to be a lot of them.

Is there a way to see accordion folded results that you can expand to see more? I think being able to skim through titles would be helpful.

Thanks for your answer, it’s helpful!

I think because of the way I was approaching the search, I found the results “noisy”

Yeah, this doesn’t work like the traditional keyword search, works better with more verbose natural language kind of searches. E.g “What is the meaning of life?” vs “life meaning”

Also sometimes the index can get corrupted (haven’t found out the exact path to fix it). So regenerate the index if the results seem too noisy. You can do that from the khoj obsidian settings page

and there seemed to be a lot of them.

You can set the number of results you want from the settings too. It defaults to 5/6

Is there a way to see accordion folded results that you can expand to see more? I think being able to skim through titles would be helpful.

Ah yeah, makes sense. That’s available in the khoj plugin for emacs. I’ll add that to the khoj obsidian plugin soon.

after I try to do the first step: 1. Install & Start Khoj Backend: pip install khoj-assistant && khoj --no-gui
I got this error:

can anyone help me?

the error is continued: