Writeup for a DB feature that will run in published sites

Hi! I’m Greg. I’ve designed and developed dozens of open source libraries, some of which enjoy a bit of popularity. I’ve also designed some systems, but that’s beside the point.

I’ve recently started using Obsidian for my D&D campaign, and I’ve noticed a bit of a gap in its plugin ecosystem.

There is a plugin called Dataview which is very useful. It dynamically generates tables and other representations of the data found in the vault. Unfortunately, it doesn’t work in published sites, even if you can run custom JS, which seriously limits its usefulness.

To overcome this issue, some people have taken to using Templater and Dataview together in an unholy union that manages to combine the downsides of both.

I have mixed thoughts on this. On one hand, I am a big fan of Lovecraftian horror. On the other, you need to remember to rerun all the templates every time you make a change, which is incredibly annoying.

My brain has decided I need to tackle this problem, so I set about trying to design a plugin that does the thing. Unfortunately, I’ve hit somewhat of a stumbling block, and it seems like a small but crucial part of my design would work best as a server-side component.

I really just want this thing to exist, so I’m going to give you a writeup of my design in the hopes you implement it and spare your users from engaging in profane rituals of the type shown above.

I call my solution Databake.

Design

Dataview works by storing vault data in IndexedDB. I looked at the options and decided this is probably the best solution. While some other options exist, IndexedDB is pretty fast and exists in every environment where the queries get executed.

Using IndexedDB directly is kind of gross, so I looked around and found Dexie.js. It says it’s a minimalistic wrapper, but that’s a lie, and actually we’re going to use some of its extra features. Also I like its query API.

How files are read

Databake won’t load files as they are requested by queries. Instead, it will parse the whole vault into its DB and update said DB based on changes. Any queries in an open page will run again when the DB is modified. Dataview already has code that does this.

If the DB is being updated, queries will wait for the operation to finish before running.

Databake won’t parse markdown, since doing so opens up lots of problems I don’t want to deal with. Instead, it will load the first 100 KB of a markdown file into memory and parse out the frontmatter from that (if any). It will do this in parallel, reading up to 20 files at the same time.

However, if the file is a data file like JSON or YAML, it will parse the entire file. This also encourages storing large amounts of data in dedicated data files, rather than as massive bits of frontmatter, which sounds like a good idea.

Loading files into the DB

The DB will have a UID $$uid and will also index some other information about the files. The indexed fields will start with $ so they don’t collide with user data.

$path,$name,$ext,$size,$mtime

When a change notification is received, the file might not be loaded if the mtime and size are the same. The path is the path from the vault root.

The file’s data content will be merged with the object above, ignoring any columns with the same name as indexed columns.

This is where user-defined indexing can be defined so that JS queries can run against indexed fields. At first, this won’t be supported.

Querying

Databake will have two query runtimes – one for the electron app and one for published sites. I’m not sure what the mobile version is like, so there might need to be a third runtime. However, the runtimes will share a lot of code.

We’ll use Dataview’s code block approach for the query format. So something like:

```databake
table(
	db.where("$name").equals("something")
	.filter(x => x.someUserField > 5)
	.toArray(x => row(x))
)
```

I don’t want to deal with designing and parsing a custom query language right now, so Databake will allow only JavaScript-based queries. The JavaScript will execute in a sandboxed worker, so it won’t be able to mess with the electron UI or enable XSS attacks. This will also homogenize the runtimes so that all queries execute in the same environment.

At first, Databake would just wrap the Dexie query API, exposing a subset that doesn’t allow updating the DB. Later custom query operations might be added.

Each query will just return virtual DOM nodes, which can be constructed inside the worker, and then passed along to the UI thread. This will protect against XSS attacks while also allowing a lot of flexibility.

The query runtime will include functions for building standard virtual DOM nodes of various kinds. Or maybe it will just be JSX.

If the user’s query returns markdown, it will be converted into a component for rendering markdown or maybe just be embedded in place of the code block into an existing markdown component.

There is even the possibility of queries returning custom React components, which will be embedded inside the page’s markdown-sourced content. React inside markdown is something I experimented with in the past.
However, this would require having the custom components in the first place, and it would require custom JS to set up those components.

Databake file

This is what allows queries against the vault to run in the client.

The Databake file is an exported IndexedDB, using Dexie’s export functionality. This means it’s a copy of all the data inside the vault, as JSON.

A published site with the Databake script in it will check for the IndexedDB and read metadata from the Databake file (this might involve a 2nd file that just has metadata, or maybe the metadata can just be embedded into the page.) If it’s different, the new DB will be imported.

But who bakes this file? Well, this is the rub. The easiest and best solution is for the publish server to bake it. Baking it on the user’s machine has a number of issues:

  1. The file will be baked according to the user’s vault contents, not the published contents. It might be possible to keep track of the published content, but only by introducing a lot more complications.
  2. Baking the file can take a while for large vaults, which will delay the publish process.
  3. The file needs to be rebaked every time the vault is changed, which is annoying and pointless.
  4. The risk of the process crashing during publish or when the file is generated.
    1. Data corruption.
    2. If part of the vault is uploaded and the process crashes, the published site would be out of sync.

So, lots of reasons. But the server probably runs Node.js or something similar, which doesn’t support IndexedDB. Well, luckily we don’t actually need any queries to run there. We just need to prepare a JSON file indexing the vault’s contents according to the Dexie format.

Preparing this file might be computationally expensive, but there are ways to improve performance, and if all else fails we can change the format for something else. Before finding out about Dexie, I was thinking of having a base JSON file and an append-only ledger of diffs, with the whole thing being compressed periodically.

Conclusion

Well, I had fun writing this. Hope it’s useful to someone. Thanks for a great product!