Dataview JS Getting The Tree Root

holroy · February 17, 2023, 2:22am

Yes, that is correct.

And No, I didn’t do what you (not?) suggested. Not at all…

Disclaimer: Any picking up of bait, is done on your own risk.

I-d-as · February 17, 2023, 2:27am

Unstoppable! I’ll always remember the afternoon/night that @holroy casually created a new graph view, among many other things.

BorisZ · February 17, 2023, 12:03pm

I already see @holroy’s great solution, but here are some more additional thoughts:

more optional chaining should be used since:
- child may not be present at all
- child’s link (value) is just a link thus the file may not be created yet (aka ghost link)
about child:: [[File D]], [[File E]]: I don’t quite know how Dataview process comma separated values, I personally prefer defining values in separate fields:

child:: [[File D]]
child:: [[File E]]

Also I figured out few more pitfalls:

Something besides links doesn’t work now (i.e. child:: just a text). Can be easily fixed, however, it’s sufficient for my needs.
Links with some additions will also not work (child:: [[File D]] qwerty). Better solution for me is to have additional [[File D]] field (description:: qwerty). Then just display it as shown here. It is more “structured” approach in my opinion.

BorisZ · February 17, 2023, 12:12pm

What I want to implement is a step-by-step checklist for my life/work/etc. That’s why I need strict children order (and that’s why I prefer top-down over down-top linking).

However, your solution definitely gives me insights and ideas how to improve my current solution. Especially your approach to avoid infinite loops. Many thanks!

holroy · February 17, 2023, 12:25pm

What do you mean with “strict children order”? Just the simple list variation? The order does show in the mermaid graph as arrows, and I do believe it can be further enhanced.

However, the mermaid solution can’t add that much additional text of any sorts if that’s what you’re looking for.

And what do you mean when saying “top-down” is preferred over “down-top”? That doesn’t really make sense me just now.

holroy · February 17, 2023, 12:35pm

I’m mainly going to refer to my “mermaid” solution when addressing your issues, just to be clear on that. And I was under the presumption you used directive:: to denote the childs, and not child::, so even though they’re different I assume you use either one of them (consistently). (And yes I do see your response was to the post by I-d-as, but the question are valid, so I addressed them for my post as well)

This I handle through the initial query, where I’m ignoring all notes not having the directive field.

If it’s not a link yet, it can still be handled as a link target from a definition of directive. In other words, it’s doable to make the rightLink be non-existent, by changing the rightLink definition to

rightLink = connection.path.match(/\/?([^\/]+?)(\.md)?$/)[1]

BorisZ:

about child:: [[File D]], [[File E]]: I don’t quite know how Dataview process comma separated values, I personally prefer defining values in separate fields:
child:: [[File D]]
child:: [[File E]]

These definitions can be unwieldly, and cause some headache, so I do believe both of these works, but I tend to use the second variant myself, as I find it a little easier to read.

Do however note that you can’t do the following in the frontmatter:

---
status: "Will fail!!"
child: "[[File D]]"
child: "[[File E]]"
---

This will only keep the last entry, so then you have to resort to other ways to describe it like:

---
status: "Untested, but should work"
child:
- "[[File D]]"
- "[[File E]]"
---

In my solution, I didn’t check for this either. If you’re having faulty input data, aka text not links, you’re “paying the price” by not getting results. Could possibly be handled by checking for whether the rightLink is an actual link, and if not just display it.

Following the logic of the previous step; These are not links they’re text and links, and as such doesn’t follow the recipe for success. You’re better off with some additional marking then, like you suggests.

BorisZ · February 17, 2023, 1:29pm

Top-Down is when parents define their children (i.e. MoC, Index/Meta Notes):

# Animals MoC
- [[Dog]]
- [[Cat]]

Down-Top (Bottom-Up etc.) is when children define their parents (initial Zettelkasten idea):

# Dog
An [[Animal]]

# Cat
An [[Animal]]

Actually, I prefer Down-Top approach since it’s faster, more scalable, and takes less effort (for me personally).

However, let’s imagine some step-by-step algorithm:

Open the fridge
Get the food
Close the fridge

With Top-Down approach we define these actions in a list, the order is guaranteed:

# Cooking
action:: [[Open the fridge]]
action:: [[Get the food]]
action:: [[Close the fridge]]

But with Down-Top the order is not guaranteed since Cooking fetch actions as backlinks:

# Open the fridge
A [[Cooking]] action

# Get the food
A [[Cooking]] action

# Close the fridge
A [[Cooking]] action

# Cooking
Empty

By the way, having type:: [[Directive]] in my files is also technically a Down-Top. I could have used a Top-Down:

# Directive
- [[File A]]
- [[File B]]
- [[File C]]

But specifically in this case, I don’t need any order. Thus Down-Top can be used because of the benefits I noticed above.

BorisZ · February 17, 2023, 1:31pm

Agree, but I have 10’s of childs, making them comma-separated is not the best way for me.

This YAML syntax is visually the best one, but, unfortunately, Obsidian stop treating them as links (graph view, backlinks etc.):

child:
- "[[File D]]"
- "[[File E]]"

holroy · February 17, 2023, 1:46pm

It’s true that Obsidian stops treating them as links, but dataview does continue to treat them as links. Which can be handy some times.

First let it be said, that I’m doing what you are doing, and defining those lists of links within the body, so that they would be updated if renamed, showing in graph view, used in backlinks, and so on…

Secondly, there has been made a plugin which supposedly fixes these issues. I’ve not tried it out, but a very basic test seems to indicate that it does work. See the following if interested:

holroy · February 19, 2023, 10:47pm

I’ve addressed these issues I think, but I’m still struggling with how to make a proper tree structure out of it, so that one could detect the roots automatically.

The current version is:

```dataviewjs
const childrenKey= 'directive',
  rootNames= [
    "A", "I", "Shapes"
  ]

const rootEl = dv.el("ul", "")

let renderedNodes = {}

dv.array( rootNames.map(it => dv.page(it)) )
  .forEach(it => renderNode(it, rootEl))

function renderNode(node, container) { 
  let liEl, liText

    if ( typeof(node) == "string" ) {
      liEl = dv.el("li", node, { container })
      return
 	} else if ( node instanceof Link ) 
	  liText = `[[${ node.path }]]`
	else  
	  liText = node?.path ?? node.file?.link
	  
    const alreadyRendered = liText in renderedNodes
     
    renderedNodes[liText] = 1
  
    liEl = dv.el("li", liText + (alreadyRendered ? " ➰" : "") , { container })

    if ( alreadyRendered || node instanceof Link) 
      return

    const ulEl = dv.el("ul", "", { container: liEl })
    console.log(liText)
    if ( node[childrenKey] ) {
	  const c1 = dv.array(node[childrenKey])
	    .map(it => dv.page(it) || it)  
	    .forEach(it => renderNode(it, ulEl))
    }
}
```

There are faults with this still, as this is indeed a tricky task, but this version returns the following on my definitively evil test setup:

The corresponding “correct” mermaid graph shows:

So the current shortcomings are:

It doesn’t detect the extra root of “G”
It doesn’t detect any roots automatically
It doesn’t allow for collapsing of sub lists

The question is how bad is it for you guys to define the roots manually? Should I move on, or should I keep trying (and twist my brain even more)

I-d-as · February 20, 2023, 1:10am

I can’t imagine most would already have systems of fields with links in their existing notes that could easily be refitted to work within this setup. But, a decent number do use these types of definitions, whether by having committed to a Dataview, Breadcrumbs system, or otherwise. But I do imagine some could reasonably easily, for example, just change the child or directive type field name in either their notes or the script, and try to utilize the script as is. If they only have a few roots to identify, it’s easy enough.

But, in some ways, I see this solution as most powerful for those with a mess that they want to get a handle on. And for that reason, if it were possible for a reasonably effective and minimally frictional automation of this process to be implemented, it would be super helpful for these individuals, myself included. Not to mention, it would of course be nice to have one less thing to worry about if intending to benefit from this tree view going forward.

Thanks again for your shared creativity. It is a beautiful thing to see in action.

BorisZ · February 20, 2023, 11:56am

Nice solution! It provides a lot of ideas to implement.

The question is how bad is it for you guys to define the roots manually?

For me, it is an advantage

I realize that the trees will grow eventually.
I expect ~50 roots for 1000 nodes, so it’s not a problem
I like to place a single-root query right within that exact root (I even composed that logic to another view that wraps the old one)

# One of my roots
``dataviewjs
await dv.view("dataviews/treeview", {
  childrenKey: 'child',
  rootNames: [
    "One of my roots",
  ]
});
``

I like to pick not only roots, but other nodes as roots too

# Structure
- A
  - B
    - C

# Query Result
- B
  - C

holroy · February 20, 2023, 11:33pm

After some major headache, and twists and turns, I finally got a version which is able to both pick roots automagically, detect multiple presences of a node in the tree (and avoid re-presenting it), detect loops, and finally to be able to specify which nodes you want to present as root nodes. This latter option does however require for the roots to be within the predefined nodes of the automated search.

The only thing missing now, is to get the nodes to collapse, which should be doable by changing from a dataviewjs script into a Templater execution command template. But that’s for another day. The code is presented as a standalone script, but I strongly recommend to make it into a dv.view() script, and pass the parameters at the top into the view script.

The full script in all it glory(?)

```dataviewjs

// Change this to suit your liking
const childrenKey= 'child',
  loopIndicator = " ➰",
  consoleDebug = false   // change to true, if you want to see some debug

let myRootNames = []
// // If you want to use predefined roots, uncomment
// // and adapt the next line to list your note names
// myRootNames = [ "J", "Shapes" ]

let 
  nodes = {}, 
  nodeIdCounter = 0,
  allIds = {},
  markdownOutput = ""

if ( consoleDebug )
  console.log(`\n\n\n******   New run ******\n\n`)

// Query all notes once and for all
// aka build the cache for nodes
dv.pages()
  .where(p => p[childrenKey])
  .forEach(note => handleNote(note))


// Do some setup in order to build the tree
const multipleVisitation = new Set()
const allVisitedNodes = new Set()
let loopDetection = {}
let currentRootNodeId
const DRY_RUN = true


if ( myRootNames.length == 0 ) {

  // Find nodes with no parents
  const noParentNodes =
    Object.keys(nodes)
      .filter(k => nodes[k].parents.length == 0)
      .map(k => k) // nodes[k].value)

  if ( consoleDebug )
    console.log("Automated root nodes: ", noParentNodes)
  
  // Do a dry run to check which nodes are presented
  // multiple times. These are stored in multipleVisitation set
  loopRootNodes(noParentNodes, false, DRY_RUN) // Builds

  if ( consoleDebug )
    console.log("loopDetection: ", loopDetection)

  // Reset some variable before the real run
  allVisitedNodes.clear()
  loopDetection = {}
  // We're leaving multipleVisitation as is from the DryRun

  loopRootNodes(noParentNodes, false)
} else {

  // Build the nodeId list of your predefined names
  const rootNodes = []
  for (let name of myRootNames) {
    const page = await dv.page(name)
    console.log("What the ... ")
    console.log("name: " +  name + ", page: ",  page)
    
	const nodeId = getNodeId(page.file.link.path, false, false)
	if ( nodeId ) 
	  rootNodes.push(nodeId)
  }
  
  if ( consoleDebug )
    console.log("Pre-defined root nodes: ", rootNodes)
  
  loopRootNodes(rootNodes, true, DRY_RUN)
  
  if ( consoleDebug )
    console.log("loopDetection: ", loopDetection)
    
  // Reset some variable before the real run
  allVisitedNodes.clear()
  loopDetection = {}
  // We're leaving multipleVisitation as is from the DryRun

  loopRootNodes(rootNodes, true)
}

/*********************************************/
/*  Function definitions                     */
/*********************************************/

function handleNote(note) {
  // Prepare note[childrenKey] for loop handling
  if ( note[childrenKey] instanceof Link ||
       typeof(note[childrenKey]) == "string" ) 
    note[childrenKey] = [ note[childrenKey] ]
    
  // Parent node 
  const filePath = note.file.path
  const fileId = insertNode(note.file.link)
  if ( consoleDebug )
    console.log(`\n${ fileId } (${ nodes[fileId].type }) = ${ nodes[fileId].value }`) 
      
  for (let connection of note[childrenKey] ) {
    // console.log(connection)
	let nodeType, nodeId
	  
	if ( typeof(connection) == "string" ) {
	  nodeId = insertNode(connection, filePath, true)
	   
	} else if ( connection instanceof Link ) {
	  nodeId = insertNode(connection, filePath)
	}
	nodes[fileId].children.push(nodeId)

    if ( consoleDebug )
      console.log(`  ${ nodeId } (${ nodes[nodeId].type }) = ${ nodes[nodeId].value }`) 
  }
}

function insertNode(value, parentPath = null, textNode = false) {
  const nodeId = getNodeId(textNode ? value : value.path, textNode)
    
  if ( nodeId in nodes ) {
    nodes[nodeId].occurences += 1 
    
  } else {
    const nodeType =  textNode ? 
      "text" : ( value.path?.endsWith(".md") ? 
        "link" : "newlink" )
  
    nodes[nodeId] = { 
	  value: value,
	  parents: [],
	  occurences: 1,
	  children: [],
	  type: nodeType
	}
  }
  
  if ( parentPath ) 
    nodes[nodeId].parents.push(parentPath)

  return nodeId
}

function renderNode(nodeId, currentLevel, isDryRun = false) {
  // console.log("pre renderNode: ", nodeId)

  let previouslyRendered = false
  let loopDetected = false
  
  if ( currentLevel == 0 ) {
    currentRootNodeId = nodeId
    loopDetection[currentRootNodeId] = new Set()
  }

  // Check for multiple visitations
  if ( allVisitedNodes.has(nodeId) ) {
    // Don't traverse children, but tag is multiple visited
    multipleVisitation.add(nodeId)
    previouslyRendered = true
  } else {
    allVisitedNodes.add(nodeId)
  }

  // Check for loops
  if ( loopDetection[currentRootNodeId].has(nodeId) ) {
    loopDetected = true
  } else {
    loopDetection[currentRootNodeId].add(nodeId)
  }

  // Present this node
  if ( !isDryRun ) {
    let itemText = "   ".repeat(currentLevel)
    itemText += "- "
    /*
    if ( nodes[nodeId].type == "text" ) 
      itemText += nodes[nodeId].value
    else
      itemText += `${ nodes[nodeId].value }` */
    itemText += nodes[nodeId].value
    
    if ( multipleVisitation.has(nodeId))
      itemText += ` [^${ nodeId }]`
      
    if ( loopDetected )
      itemText += loopIndicator // " loop:" + nodeId
      
    markdownOutput += itemText + "\n"
    if ( consoleDebug )
      console.log(itemText)
  }

  if ( !previouslyRendered && nodes[nodeId].children.length > 0 ) {
    for (let child of nodes[nodeId].children) {
       renderNode(child, currentLevel + 1, isDryRun)  
    }
  }
  if ( loopDetection[currentRootNodeId].has(nodeId) )
    loopDetection[currentRootNodeId].delete(nodeId)

  // console.log("post renderNode: ", nodeId)
}

function loopRootNodes(rootNodes, ignoreOthers = true, dryRun = false) {
  for (let rootNode of rootNodes) {
    renderNode(rootNode, 0, dryRun)
    if ( !dryRun ) 
      markdownOutput += "\n"
  }

  if ( !ignoreOthers ) {
    while (Object.keys(nodes).length != allVisitedNodes.size) {
      let remainingNodes = Object.keys(nodes)
        .filter(k => !allVisitedNodes.has(k))
      renderNode(remainingNodes[0], 0, dryRun)
    
      if ( !dryRun ) 
        markdownOutput += "\n"
    }
  }

  if ( !dryRun) {
    // Prepare for footnote output
    markdownOutput += "\n"
    for (let multi of multipleVisitation) {
      let footnoteText = `[^${ multi }]: `
      footnoteText += nodes[multi].value

      markdownOutput += footnoteText + "\n"
    }
    dv.paragraph(markdownOutput)
  }
  
  if ( consoleDebug )
    console.log("The final output: \n", markdownOutput)
}

function getNodeId(key, textNode, assumeInCache=true) {
  let keyPrefix = ""
  
  if ( textNode ) {
    key = "///" + key
    keyPrefix = "_"
  } 
  
  if ( !(key in allIds) ) {
    if ( assumeInCache ) {
      nodeIdCounter += 1   
      allIds[key] = keyPrefix + toLetters( nodeIdCounter )
    } else
      return false
  } 
  
  return allIds[key]
}

function toLetters(num) {
 let mod = num % 26,
     pow = num / 26 | 0,
     out = mod ? String.fromCharCode(64 + mod) : (--pow, 'Z')
  return pow ? toLetters(pow) + out : out;
}

A very few notes on this script:

It doesn’t fold since it uses markdown output through dv.paragraph(). I haven’t found a better way to output it as markdown within dataviewjs. Any ideas will be welcomed
The script runs in several phases:
- First it builds the nodes list by traversing all pages with the childrenKey, and this cache is used throughout the rest of the script
- Then it chooses whether to use predefined roots, and ignore other nodes in the list, or to search for nodes without a parent (and then complement with other nodes until all nodes are presented)
- When it’s ready to do the root nodes, it’ll first do a dry run with no output, to check whether any nodes will be presented multiple times, and whether there exists any loops in the tree
- After the dry run, it’ll build the markdownOutput, and add footnote references to all nodes being presented multiple times, and loop indicators if that’s needed. It’ll also now build the footnote list, so one can check where the different presentation of a node is given
  - In the tree view you’ll see references like “2”, “2-1”, “2-2”, when it’s referencing the same node from multiple places
  - Through the footnote list, you can access each of these places by clicking the various arrow links after the link presentation
It allows for children fields to be all of the following:
- A pure link
- A link to a non-existing file
- A link with either a path or an alias (and it’ll keep the alias)
- Pure simple text
- and even embedded image links (to some extent (that is as good as dataview handles them… )
A little note on the building of the nodes list, it builds a node for each of the fields referenced by the childKey, and this node holds the following information:
- “the key” – It’s a running numerical key as letters, that is A, B, C, … Z, AA, AB, … This format is chosen for ease of reference (and debugging), and for use when building the footnote references
- value - This holds either a full Link variable, or the plain text of the field
- parents[]– An array of all the nodeId’s to which this node is a parent of
- children[] – An array of all the nodeId’s to all the children of this node
- occurences – An initial attempt to keep track of how many times we’ve encountered this node when building the tree. It’s now deprecated, and will most likely be removed
- type – Either “text” to denote the plain text of the node, or “link” or “newlink” to denote either an existing link or non-existing link. In the end, this also turned out to be futile, as when presenting a full Link object, it handles the difference between them by itself.
- There is a redundancy in the node list with regards to keeping track of both the children and the parents, but it makes some of the later decisions a lot easier, so I kept using that. Surely it a few bytes of memory is wasted, but not a whole lot. It keeps the logic simpler

So there you have it, a fully automated tree builder based on the definition of children fields scattered around your vault. To finish off this rather longish post, here are two views of my (somewhat evil) test setup. One using a mermaid graph, and the other as the tree list presented by this script.

Mermaid graph of my test setup

Treeview of my test setup

The image in the middle is a downscaled (through alias notation on the embed) children of “Shapes”.

BorisZ · February 21, 2023, 11:47am

Wow, that’s amazing! You didn’t leave this problem even a single gap

Footnotes are something I wouldn’t even think about.

I think you should create a gist or something, so others can comment exact code lines. It’s definitely worth it.

avirosso · February 28, 2023, 9:20am

@holroy: Thanks for this GREAT script!

One problem/question: I’m using markdown links[displaytext](link), not wikilinks [[link|displaytext]]. It seems, that your script treats markdown links als “text”, not as “link”, so the generated output stops after one level.

I’m a noob in javascript - so could you please give me a hint, what must be changed so that markdown links are recognized als links?

Thanks in advance!

holroy · February 28, 2023, 12:06pm

I treat links as links, so it might be your markup. How do your markup for the children entries look? Are they in the frontmatter or in the text?

avirosso · February 28, 2023, 6:08pm

Thanks for your answer.

I use const childrenKey= 'Sub'

And dataview inline-fields in the text for children entries: - Sub:: [Survivorship Bias](Survivorship%20Bias.md)

holroy · February 28, 2023, 10:48pm

It does seem to be something changing when using that style of links, even though I don’t actually treat any of the links special. But it might be that there is a differences related to which path the various styles of links produces, so it will require a normalisation of the links.

Another thing I’m seeing is that introducing the %20 instead of a normal space character, , also triggers something.

So all in all the links produces does work, but they’re not identical in the syntax behind the scenes, as some include the path and some don’t. And some are URL encoded, and some are not. I need to look a little more into this aspect of my script, and see what I can think of, and see if there is an easy modification to be made to make it work.

Just one final question, though, you do see all of your nodes from your vault? It’s just that they’re not connected together which is missing, right?

Like in the image below, there should be a link from “Shapes > with four sides” to “Four sides”, which is missing. But all the shapes are in the image.

avirosso · March 1, 2023, 5:34am

Thanks for your research and your detailed answer!

Regarding path: I use “shortest path when possible” in the settings.
Regarding the %20 (instead of space character): This encoding is done automatically by Obsidian (and conforms to the markdown specification - as far as I know).
Regarding your ‘final question’: Yes, if I leave ‘myRootNames’ empty, I see all of my nodes in the vault but they’re not connected together, just like in your image.

Hope it helps.

system · March 8, 2023, 5:35am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.