Obsidian Markdown Parser

dvargas92495 · March 7, 2023, 5:25pm

Has there been discussion about or interest in surfacing an API to access the Obsidian underlying markdown parser? What I have in mind for my use case is something like:

const result = await plugin.app.workspace.parse("Hello *world*");
console.log(result);
/**
[
  { text: "Hello ", type: "text"},
  { text: "world", type: "italics"},
]
*/

Obviously, the actual schema of the result array could resemble what Obsidian actually uses under the hood.

Up to this point, I have been using the recommendations found in Format your notes - Obsidian Help, but there are several undocumented nuances that could cause issues. For example, unlike the common mark spec, the string **Hello actually results in Hello bolded instead of rendering that full line as text. For these cases, it would simplify our use case immensely to be able to access the output of the parser directly.

Happy to expand on more if needed!

coreSpotz · April 15, 2023, 7:21pm

Yes it would be nice from the developers to give access to the parser and the grammar ! Especially if they use a parser-generator because that would make it really easy.

smackesey · August 6, 2024, 1:32pm

I too am looking for this. Currently I use remark with some plugins, but the result is far from complete. For reference:

import { unified } from 'unified';
import remarkParse from 'remark-parse';
// @ts-ignore: No type definitions available for this module
import remarkWikiLink from 'remark-wiki-link';
import remarkGfm from 'remark-gfm';
import remarkRehype from 'remark-rehype';
import rehypeStringify from 'rehype-stringify';
import { VFile } from 'vfile';
import { inspect } from 'unist-util-inspect';

describe('obsidian markdown parsing', () => {
  it('parses a sample markdown text with remark-obsidian', async () => {
    const sampleMarkdown = `
      # Test Note

      This is a [[link]] to another note.

      ![[embedded note]]

      - [ ] Task 1
      - [x] Task 2

      > [!info] Callout
      > This is a callout block
  `.replace(/^[ \t]+/gm, '');

    const file = new VFile(sampleMarkdown);
    const result = unified().use(remarkParse).use(remarkGfm).use(remarkWikiLink).parse(file);
    // shows syntax tree
    console.log(inspect(result));
    const root = result.children[0];
    expect(result.children[0].type).toBe('heading');
  });
});

Will print:

      root[5] (1:1-13:1, 0-142)
      ├─0 heading[1] (2:1-2:12, 1-12)
      │   │ depth: 1
      │   └─0 text "Test Note" (2:3-2:12, 3-12)
      ├─1 paragraph[3] (4:1-4:36, 14-49)
      │   ├─0 text "This is a " (4:1-4:11, 14-24)
      │   ├─1 wikiLink "link" (4:11-4:19, 24-32)
      │   │     data: {"alias":"link","permalink":"link","exists":false,"hName":"a","hProperties":{"className":"internal new","href":"#/page/link"},"hChildren":[{"type":"text","value":"link"}]}
      │   └─2 text " to another note." (4:19-4:36, 32-49)
      ├─2 paragraph[1] (6:1-6:19, 51-69)
      │   └─0 text "![[embedded note]]" (6:1-6:19, 51-69)
      ├─3 list[2] (8:1-9:13, 71-96)
      │   │ ordered: false
      │   │ start: null
      │   │ spread: false
      │   ├─0 listItem[1] (8:1-8:13, 71-83)
      │   │   │ spread: false
      │   │   │ checked: false
      │   │   └─0 paragraph[1] (8:7-8:13, 77-83)
      │   │       └─0 text "Task 1" (8:7-8:13, 77-83)
      │   └─1 listItem[1] (9:1-9:13, 84-96)
      │       │ spread: false
      │       │ checked: true
      │       └─0 paragraph[1] (9:7-9:13, 90-96)
      │           └─0 text "Task 2" (9:7-9:13, 90-96)
      └─4 blockquote[1] (11:1-12:26, 98-141)
          └─0 paragraph[1] (11:3-12:26, 100-141)
              └─0 text "[!info] Callout\nThis is a callout block" (11:3-12:26, 100-141)