`setActiveLeaf()` is creating a massive pile-up of workarounds in plugins

pjeby · January 8, 2021, 9:46pm

There are two design flaws in the API method, setActiveLeaf(), that are causing virtually every plugin that in any way deals with the currently active leaf (either by responding to activation, causing it, or just interacting with the active leaf during a click event!) to have to include kludges and workarounds (including the use of setTimeout()!) to try to get things to work properly.

First off, as I’ve previously mentioned on certain bug reports, setActiveLeaf() only sets the workspace .activeLeaf property and sets the .mod-active class on the leaf’s DOM node.

This means that any plugin that wishes to really activate a leaf must re-implement the same code as the various editor:focus-* commands, and hope to get it right. For this reason, either setActiveLeaf() needs to perform all the actions needed to activate the leaf (including focusing the codemirror or preview scroll region – either directly or by delegation to the view), OR there needs to be an API call that can be used to do this.

The second cause of plugins needing to add workarounds is the file-open event. This event is triggered synchronously in setActiveLeaf(), which sets things up for unintended re-entrance on the call stack and unpredictable behavior depending on what part of Obsidian called setActiveLeaf() in the first place.

This is why more than one plugin is using setTimeout(): in order to process the event without conflicts, they need to wait until after the current event loop turn is over, so that whatever was doing the setActiveLeaf() can be finished. The duration of the timeout is of course a guess, as there is no way to know a priori what it should be.

Unfortunately, with more than one plugin hooking this event with its own guesses of when to trigger, and more than one plugin invoking setActiveLeaf() and doing its own focusing workarounds, this is creating a major pileup wherein the behavior of one plugin is very much dependent on what other plugins are installed… and also fairly non-deterministic due to timing issues.

This issue is further compounded by what happens if setActiveLeaf() is called multiple times in the same event turn, with different leaves. A file-open event will be triggered for each one, and then the workaround setTimeout()s will be triggered twice, possibly out-of-order, and possibly interleaved with those of other plugins in unpredictable order!

This entire set of problems could be avoided by simply triggering the file-open event asynchronously, using e.g. process.nextTick(). This would eliminate the need for plugins to use setTimeout(), as they could simply perform their desired action and know that the UI had already been fully updated beforehand, and that they would be executing in an orderly fashion, with no non-deterministic behavior.

Of course, there would still be the issue that if there is more than one setActiveLeaf() call in the same event turn, then plugins could process a file-open event for a leaf that is no longer active. (This condition can also happen now, since by the time a setTimeout() happens, the active leaf could have changed again!)

This could be fixed by having the async trigger fire once, for the active leaf as of the end-of-turn. The downside to this is that the implied semantics of a file-open event are that you should receive it when files are opened, and if that’s really what someone is trying to trap, they might want to receive every instance rather than just the most recent.

But, since the file-open event is also the only way to detect a change in the active pane, there isn’t a way to make a clean distinction without introducing a separate event. (Maybe active-leaf-changed?)

So, yeah. I know I’ve spent hours (today alone) trying to figure out why the heck my plugin wasn’t working, tracking down what other plugins were doing, enabling and disabling things, wading through kludges on top of other kludges in the debugger just so I could figure out a meta-kludge to work around all the other kludges. Not fun.

Especially since my plugin wasn’t even trying to set the focus or be triggered by it! It was just catching a mouse-down in a codemirror and invoking an editor command. But if you happened to click in a non-active pane, things went haywire because of the order in which setActiveLeaf() was being called relative to when the command executed. (And that was just the tip of the iceberg of problems involved.)

I was finally able to work around the issue by literally capturing the mouse event and then telling codemirror to pretend it received the event directly, just so that all the active-leaf-setting fallout would happen before my method ran, instead of after, and I wouldn’t have to guess at a setTimeout() interval that’s somewhat dependent on what other plugins are active.

This problem with the file-open event is also representative of a problem with Obsidian events in general: they are synchronous even when they don’t need to be.

The only time an application event should be triggered synchronously is when it’s important for event handlers to be able to cancel the operation in progress, contribute data or options to that operation, or otherwise interfere with it somehow.

For example, if a menu was going to pop up, and it wanted to give plugins a chance to add dynamic, context-specific menu items, then a synchronous event that’s passed a menu object as an argument makes perfect sense: event handlers can then just add their items, and they’re there for the actual rendering.

But if an event exists merely to notify plugins that “hey, I did this thing y’all”, then it should be triggered asynchronously, as otherwise the kinds of problems we’re seeing with file-open are guaranteed to happen eventually.

In the specific instance of file-open, the problem is that setActiveLeaf() does not really know whether the current operation is “finished”, so trying to tell plugins that, “hey, this thing is done” is premature. In the case of the editor:focus-* commands, for example, the operation is definitely not finished yet, so the event is fired in the middle of the operation. Using process.nextTick() to trigger “notification” events (vs “participation” events) alleviates this problem by waiting until the current UI-level event is over before notifying plugins that it’s over.

One way to think of this, is as a kind of ask vs. tell or command/query separation. If the purpose of an event is to ask plugins something, then it should be synchronous, as the triggering operation needs to get an answer before it can proceed. But if the purpose is just to tell plugins that something happened, then it can (and should) wait until the entire current operation is completed, using process.nextTick() or the like.

Licat · January 9, 2021, 1:12am

That’s… one very long and scary post
All fair points that we should probably address though.

I’ve summed it up into major 3 points:

file-open event should probably fire async to avoid plugins from calling setActiveLeaf within a file-open event.
setActiveLeaf should properly set the focus (a point raised in a previous post).
maybe a new event for active-leaf-change instead of file-open.

In addition, I’d like to argue against “async by default” events. Processing it synchronously gives the event receiver a chance to act synchronously if need be. The receiver can then choose to act asynchronously or with a debounce. Making all notify events async would strip away that choice. However, I do agree that in this specific case, (and in other cases where it causes trouble), we should make it async.

pjeby · January 9, 2021, 5:25am

With regard to the new event, I was thinking more an additional event, rather than just changing it, if there are in fact use cases for tracking file opens independent of the UI aspect. (I don’t know if there are, or if file-open is being strictly used as a proxy for pane activation… though that’s certainly all I use it for.)

Regarding async by default, may I suggest a compromise? Have the official event be async, but also trigger an "unsafe:"+name event synchronously. That would send a sufficiently strong signal that doing anything during the unsafe version is in fact, unsafe.

The truth is that synchronous-by-default events have broadly been considered a serious architectural anti-pattern in Javascript frameworks since before jQuery became unpopular. (In no small part because all it had were synchronous callbacks and events.)

Synchronous events and callbacks are ridiculously hard to reason about, and the problem only gets worse when you are actually composing complex applications – and anything with a plugin system counts as “complex” for this purpose.

In effect, they turn a program into a multithreaded program, where anything can happen at any time, causing unlimited race conditions (of which the current affairs re: setActiveLeaf() are a great example). The problem is that unless the event triggering locations are chosen very carefully, you are virtually guaranteed to be mucking about with some aspect of the application that is currently in flux.

I’m not opposed to having synchronous events at all, mind you, it’s just that I would much rather see explicit synchronous query points built in (ala Wordpress actions and filters), to get more stable application behavior with less nightmare debugging of hard-to-reproduce timing problems.

In the case of this particular event, though, I would say there is a 100% clear-cut case that there is no meaningful way for an event handler to intervene in the operation of setting the active leaf… let alone to intervene in whatever random command or plugin might be invoking that operation!

If you’re going to have synchronous intervention, it needs to be placed much further up the action hierarchy. For example, having a “before-command” event that gets passed a copy of the command object and gets to tweak it beforehand. Such an event is in a saner position to address the operation as a whole, since it knows what the “whole” even is – something that the recipient of file-open cannot determine in any way.

This is why I see the tradeoff in terms of making most events async: most of the events that Obsidian currently provides 1) don’t give you any real idea of what’s going on, and therefore 2) don’t give you a safe way to intervene, even if you wanted to.

IOW, to provide useful synchronous events, they have to be placed at the start of operations, the end of operations, or at well-defined boundary points between operations. That is, at points where the application is in a stable state, or at least one whose instabilities can be identified.

At the start of a well-defined operation, nothing has changed yet, so the app is stable. At the end, any changes have been finished, so again, it’s stable. At well-defined boundary points (like the “add dynamic items to a menu” example), stability can be ensured, or limitations on safe actions documented.

Assuming an efficient event system, such events can be as ubiquitous as you like. In the case of Wordpress, for example, it’s common to trigger “filter” events on the parameters of API calls, so that plugins can intervene in this way. It’s also a reasonable approach to call filters on the results of operations, or on things that will be passed between operations. The scope of intended intervention on filter events is to modify values for a known and well-understood operation, so the system can be understood without global knowledge of the call stack.

So, my point here is that synchronous events need to be placed before, after, or between operations, and be targeted in terms of the API and/or application domain, so that the event recipient can know “what’s going on” at a higher level, and intervene in structured way. Anything outside of that, and the only way to guarantee a stable app state is to go async.

This is why simply saying “let’s just be sync by default” is not really a solution to the problem of designing extension points for a plugin API. The things that people would want to do with a synchronous event are simply not possible to do safely without the firing points being carefully chosen to allow that intervention.

In any event (no pun intended), this will likely become apparent when other synchronous events backfire in the same way in future.

Anyway, as for that being a “long and scary” post, I actually have a much longer essay on general race conditions in Obsidian sitting in my Obsidian notes inbox. It’s not quite finished, and I’m not sure if I will finish it. (Though it might make an interesting blog article at some point, as quite a bit of it is generic to the design of event systems and task co-ordination in asynchronous applications with open plugin archtiectures. ;-))

I had intended to post that essay here at some point, but at the time it was more, “this is going to be a problem for Obsidian in the future that needs careful consideration”, not “OMFGWTFBBQ heisenbug debugger marathon” like it was today. So I decided to just post very narrowly on this very specific instance of the general problem, rather than discussing the other (not event-related) race conditions I’ve encountered in Obsidian (even without any non-core plugins active), and the general implications for the design of plugin APIs and extension points, e.g. wrt the Vault design, how input focus is handled, etc.