HTML that better preserves markdown semantics for easier CSS

rab · August 12, 2023, 2:46am

Use case or problem

The HTML Obsidian generates from markdown loses a lot of the semantics present in the original markdown. For instance, *italic **bold italic*** (italic bold italic) becomes the following html:

<div class="cm-active cm-line">
	<img class="cm-widgetBuffer" aria-hidden="true" /><span
		contenteditable="false"
	></span
	><img class="cm-widgetBuffer" aria-hidden="true" /><span class="cm-em">italic </span
	><img class="cm-widgetBuffer" aria-hidden="true" /><span
		contenteditable="false"
	></span
	><img class="cm-widgetBuffer" aria-hidden="true" /><span class="cm-em cm-strong"
		>bold italic</span
	><img class="cm-widgetBuffer" aria-hidden="true" /><span
		contenteditable="false"
	></span
	><img class="cm-widgetBuffer" aria-hidden="true" />
</div>

This is two separate spans: the italic part, and the bold italic part. Whereas the markdown had the bold italic span as a child of the the italics, in the rendered HTML they are siblings whose common ancestor is the line itself.

For another example, the markdown [[link/to/page|page*alias]] is rendered as:

<div class="cm-active cm-line">
	<img class="cm-widgetBuffer" aria-hidden="true" /><span
		contenteditable="false"
	></span
	><img class="cm-widgetBuffer" aria-hidden="true" /><img
		class="cm-widgetBuffer"
		aria-hidden="true"
	/><span contenteditable="false"></span
	><span class="is-unresolved"
		><img class="cm-widgetBuffer" aria-hidden="true" /></span
	><span class="is-unresolved"
		><img class="cm-widgetBuffer" aria-hidden="true" /><span
			contenteditable="false"
		></span></span
	><span class="cm-hmd-internal-link cm-link-alias"
		><img class="cm-widgetBuffer" aria-hidden="true" /><span class="is-unresolved"
			><span class="cm-underline" draggable="true">page</span></span
		></span
	><span
		class="cm-em cm-formatting cm-formatting-em cm-hmd-internal-link cm-link-alias"
		><span class="is-unresolved"
			><span class="cm-underline" draggable="true">*</span></span
		></span
	><span class="cm-hmd-internal-link cm-link-alias"
		><span class="is-unresolved"
			><span class="cm-underline" draggable="true">alias</span></span
		></span
	><img class="cm-widgetBuffer" aria-hidden="true" /><span
		contenteditable="false"
	></span
	><img class="cm-widgetBuffer" aria-hidden="true" />
</div>

Whereas in the markdown, there was just one link, in the rendered HTML the three components of the link — page, *, and alias — are split into three separate elements whose immediate ancestor is the line itself. The three elements of the link are not grouped under a parent link element.

This makes correctly applying CSS styles incredibly difficult. If you want to do something like displaying an icon before a link, it’s very difficult to use .cm-hmd-internal-link::before { ... } in CSS because there is no one .cm-hmd-internal-link; there are three! Correctly applying selectors to pseudo elements of the link requires checking sibling elements, and the CSS quickly turns into a mess.

Proposed solution

Don’t just split the text into adjacent spans of different styles, but respect the hierarchy of the original markdown and keep spans under their parents as indicated in the markdown. The elements within a single markdown link should be contained within a single HTML link element, etc.

Current workaround (optional)

Spend hours on CSS only to give up :(

CawlinTeffid · August 12, 2023, 10:29pm

Is this from Live Preview or Reading View? Live Preview’s markup is unlikely to ever be nice.

rab · August 14, 2023, 1:19am

Live Preview. I’m less concerned with it being “nice” (i.e., not peppered with <img> tags), but merely preserving the original semantics instead of just outputting contiguous spans of uniform formatting.

CawlinTeffid · August 14, 2023, 4:23pm

I could be mistaken but I think the team is limited either by Codemirror (the text editor component Obsidian uses) or more generally by Live Preview primarily needing to function as a text editor.