Treat all YAML scalars as potential link targets

tabulo · April 2, 2021, 12:03am

Hello all,

I have posted the following suggestion as a “reply” on an existing -archived- feature request that had a similar root concern (but not the same solution).

However, I should have probably posted it as a new feature request in the first place. So there we go…

TL;DR;

This a simple suggestion for supporting links in YAML frontmatter (or within subsequent YAML blocks):

Treat all YAML scalars as potential link targets (implicitly, without any special link syntax)
(This would include those scalars that are deeply buried within nested YAML structures)

This may sound awkward or even stupid at first, but please bare with me…

In a freeform text (like markdown), we obviously do need a special syntax for links and the like.

In a structured text (like YAML), however, information is already entered and than parsed in small chunks.

Quite often, those chunks refer to other entities (whether they aready exist as an note entry or not).

In fact, even things that appear like pure attributes (like dates, e.g.: “2021-03-16”) have the potential to become first class notes (entries) some day: if the user so desires.

This kind of approach (or something equivalent) would allow Obsidian to bring together the best of both worlds (structured and unstructured approaches mentioned by @Whitenoise) for managing a knowledge base on any topic.

Any entry may start out as an unstructured freeform markdown note…

With time, as the need arises, structure could emerge… naturally and gradually… all within the same “entry” (file).

WHY ?

#Obsidian is great. It helps you quickly jot down notes in markdown and get back to them easily.

Markdown itself is great. When you mark something down, you can focus on drafting textual content, with just enough syntax to mark up a few presentational aspects.

Markdown links are also good (apart from the lack of link “labels” or “types”):

==> Not only do they end up as hyperlinks in the resulting HTML, they also enable #Obsidian (and similar tools) to function as a personal information manager, enabling each vault to act as a mini knowledge base, like a #wiki or zettelkasten.

For unstructured (freeform) information, like many of the daily quick notes, this may already be considered quite sufficient.

In some cases, however, you need a more structured approach, depending on the task at hand; mostly when some automated mecanism needs to unambigously groke the stuff.

The YAML frontmatter is usually squated for this purpose (separate YAML sections for user-data would also be welcome).

But then, you loose all the #Obsidian goodies that come with markdown links (Ctrl-click to browse, [[ for suggested completions, …).

Imho, you shouldn’t have to lose all that

With a few features like this one, #Obsidian could become the perfect tool for creating and maintaining different kinds of knowledge bases in a granular fashion and using only text files.

I am not suggesting #Obsidian to become a database management system. Nowadays, there are a whole lot of graph databases out there, but most of them focus on data exploration or visualisation, some also bring inference capabilities…

So you still need to create and maintain/curate the data somewhere…

==>

Simple text files living on ubiquitious filesystem is a perfect place to that.
Besides, YAML + markdown are already very human friendly…
And #Obsidian could be tool which facilitates it further.

EXAMPLE

Below is an example that illustrates both a use case and the workings of above solution.

The demonstrated example happens to be for a hobby knowledge-base for plants, but the suggested solution is entirely generic so it could be applied to any use case involving semi-structured granular knowledge-base on any topic (movies, books, articles, …, anything).

YAML Front matter (and then some)

---
title: Spinach
author: John Smith
tags: [“gardening”, “plant/vegetable/leafy”]

source: 
  - {name: Wikipedia, link: "https://en.wikipedia.org/wiki/Spinach", primary: true }
  - {name: Wikimedia, link: "https://upload.wikimedia.org/wikipedia/commons/c/cd/Spinach.jpg" }
---
type: species
name: Spinach
species: Spinacia oleracea
lifecycle: annual
taxonomy:
    genus: Spinacia
    family: Amaranthaceae
    clade: Tracheophytes/Angiosperms/Eudicots/Caryophyllales
    kingdom: Plantae

…

Followed by the BODY (markdown)

Spinach (Spinacia oleracea) is a leafy green flowering plant native to central and western Asia. It is of the order Caryophyllales, family Amaranthaceae, subfamily Chenopodioideae. Its leaves are a common edible vegetable consumed either fresh, or after storage using preservation techniques by canning, freezing, or dehydration. It may be eaten cooked or raw, and the taste differs considerably; the high oxalate content may be reduced by steaming.

bla bla bla...

Optionally followed by REFERENCE LINKS

[spinach:primary]: https://en.wikipedia.org/wiki/File:Spinacia_oleracea_Spinazie_bloeiend.jpg
[spinach in the fields]: https://upload.wikimedia.org/wikipedia/commons/c/cd/Spinach.jpg

What are the links in the above example ?

All the scalars found within the YAML frontmatter(s) above can be treated as potential -labeled- links:

Spinach
John Smith
gardening
plant/vegetable/leafy
Wikipedia
https://en.wikipedia.org/wiki/Spinach
…
Amaranthaceae
…

It gets better… Those are already labeled links, ready to be represented on a directed labeled graph (or go into a graph database :-), or just generate semantic triples…

For example :

link	label(s)
Spinach	`doc.title`, `name`
John Smith	`doc.author`
Wikipedia	`doc.source[0].name`, `doc.source.name`
Spinacia oleracea	`species`
annual	`lifespan`
Spinacia	`taxonomy.genus`
…	…

The above mapping convention probably needs some discussion and improvements. But it should give the general idea.

Attention

Note that the above example is written in a way that assumes the “user-data” feature being available, which would ideally allow an arbitrary number of YAML segments at the beginning of each markdown file, starting with the already supported YAML frontmatter.

The “doc.” prefix (or a similar convention) would help distinguish the data that comes from the “frontmatter” (which is metadata about the document) from the rest of the YAML segments, which may then be safely merged.

How does this approach compare to other solutions ?

PROS:

Brings together the best of both worlds of managing data (structured and unstructured), allowing structure to emerge naturally and gradually.
Generic solution for all kinds of data and metadata, without the need to reinvent the wheel for each and every problem domain (authors, articles, plants, movies, recipes, …)
Totally compatible with existing -and future- YAML syntax, without the need of any custom convention.
YAML data stays clean, void of extra syntactic noise (which might have otherwise been problematic for automated processing of YAML by other scripts that could choke on any special link syntax).
Gives us an easy and natural way of entering labeled “semantic” links (using dot notation or similar)
Probably quite easy to implement without breaking the existing Obsidian code (this is just a guess; I don’t have sufficient knowledge of the code base)

CAVEATS:

Could lead to “link noise”, listing too many things as links or backlinks.

This risk could easily be mitigated by some furher implementation choices:
- Ideally, such links could internally be marked as “implicit” and the user could be given the choice of displaying those (or not)
Also, auto-creation of unwanted entries should be avoided.
Could lead to structrure frenzy… prompting some users to over-structure their data, unnecessarily or just too early.

Well, when something is so easy (to do)… it takes some discipline to learn not to over-do…

OPEN QUESTIONS :

What impact on performance ?

FUTURE POSSIBILITIES (for later on)

As the careful readers have probably noticed, this concept could be extended in may ways differnt later on… enabling many other goodies…

Here are just a few (which may need to go into separate feature requests)

Support basic text templating within the markdown body (something like jinja2 syntax or similar)
This would help avoid repeating oneself and also minimize errors.
Detect URLs and possibly treat them differently
More knowledge-base/graph capabilities (or exporting triples into graph databases) thanks to implicit labels that come from YAML
Ability to specify labeled links within the markdown body itself, in a way that is interoprable with the described approach (in YAML).

The last two items resemble the #juggl add-in being developed by @Emile (the current version is called Neo4j Graph View)

…

Needless to say, care would be needed to avoid feature-creep and software bloat.

Obsidian was intended as an easy to use note taking app, in which it excels. So, it should not evolve into some complicated bloatware.

IMHO, with a few carefully chosen features, it has the potential to fill a huge gap… and still remain simple.

As Larry Wall once coined it: “Simple things should be easy, and complicated things should be possible [to accomplish] :-)”

Thank you for reading this far

Tabulo[n]

Related feature requests

WhiteNoise · April 2, 2021, 2:17am

That feature request is not archived. Let’s keep that one.