I thought I’d share in case anyone else was looking at doing something similar, because I often have questions about the details.
I have a large vault, which includes a subset that I consider my zettelkasten, with 10,673 files. I wanted AI to create a full list of these notes for me, in the hierarchy I’d already created for them, and summarize each note with a meaningful title. (Yes, it’s better to do this manually, but also extremely time-consuming, so I wanted an AI version that’s “finished” as a starting point, knowing I can tweak it as needed.)
The difficult I ran into was that it’s too much content to upload to something like ChatGPT, and if I copy/paste it all, the file/folder structure is lost, so all of the organizational information provided by that structure is missing from what the AI is trying to analyze.
After trying a variety of things, what ended up working was that ChatGPT wrote me a python script. My knowledge of scripting is VERY minimal; I can swap out values but that’s about it. But ChatGPT wrote the whole thing for me and told me step-by-step how to run it – with a ChatGPT API integrated into it for doing the summarizing.
There are a handful of glitchy titles (where it, for some reason, seems to have extracted some small part of the note instead of actually creating a title), but a dozen or so of those out of 10,000+ notes is workable.
It produced a markdown file for me with headers and bulleted lists of note links with titles and the word count of each note. (I don’t need these to be precise; I just wanted to be able to see at a glance which notes were long and which were short.)
I used the gpt3.5-turbo model because I was more interested in inexpensive than perfect, and it cost me $1.13 in tokens to summarize ALL of these notes. Including the fact that one chunk of them I had to run twice because I loused something up.
I’d share the script, but I don’t think it will be helpful for others’ use cases; it’s probably better to just have ChatGPT help you create your own to your own specifications. One suggestion I would make (thanks to my goof) is to tell it that if it skips a file it should still output the markdown file. (I had ONE file in a folder error out, and it didn’t produce the markdown file AT ALL, which is not ideal. I had to run the whole thing over again.) Another thing that ChatGPT had helped me work out is that it’s good to build in spots where it “prints” what it’s doing, so you can tell that it’s working – or so that if it’s NOT, you can tell what’s going wrong.