Some other notes and upgrades of interest
You can add to your prompts text like: “Detect when speakers shows slides or drawing on a board and mark these with ==Add screenshot here==.”
Pretty good.
I tried DeepSeek free API from Openrouter today.
Looked promising but allows maximum tokens that is 1/6th of Gemini Flash Thinking Exp for a context window. May be okay for 1 hr videos or less.
In this recent writeup, I offered a different YT player to go with your AI summary work:
Useful for adding more details AI may have missed, or learning about the topics with video context available, so you can even jot down some of your own ideas while the video plays. Window can be dragged and pinned.