Average of nested tags in dataview not working properly

joaosousa1 · January 18, 2024, 2:53pm

What I’m trying to do

Some months ago I searched for help related to doing some averages in dataview (link), recently I realized that in some cases it does not make the math properly, for example:

#parent/child/john/0
#parent/child/john/1
#parent/child/mary/2
#parent/child/mary/1

The expected outcome would be a table similar to that:

NAME – AVERAGE
john – 0.5
mary – 1.5

But if I add an equal tag the results do not change:

#parent/child/john/0
#parent/child/john/1
#parent/child/john/1
#parent/child/mary/2
#parent/child/mary/1

The expected outcome would be a table similar to that:

NAME – AVERAGE
john – 0.66
mary – 1.5

But what happens is this:
The expected outcome would be a table similar to that:

NAME – AVERAGE
john – 0.5
mary – 1.5

Besides that, I am also trying to get just the last five values for john and mary, but I do not think it is possible to do so using dataview.

Things I have tried

I tried deleting the FLATTEN as a value, but it still does not work, I do not even know what else I can try.

Pch · January 18, 2024, 4:46pm

It seems like Dataview “auto flatten” the double #parent/child/john/1…

I mean, in a note with only this content:

#parent/child/john/0
#parent/child/john/1
#parent/child/john/1
#parent/child/mary/2
#parent/child/mary/1

The following DataviewJS query:

```dataviewjs
dv.list(
  dv.current().file.etags
)
```

… returns :

- #parent/child/john/0
- #parent/child/john/1
- #parent/child/mary/2
- #parent/child/mary/1

parent - child

… for which the averages are indeed : 0.5 for John and 1.5 for Mary.

If Dataview deduplicates the tags from the start, I’m not sure there’s something you can do to make it work (but I could be wrong )

holroy · January 18, 2024, 5:02pm

I believe your quite right that dataview, and possibly Obsidian itself in its tags pane, only store info on the unique tags of any given file. This is specific to the tags field, though, so if it was possible to change to another field all candidates would be saved.

Pch · January 18, 2024, 5:12pm

Obsidian’s Tag pane seems to account for all the #parent/child/john/1 present
(I took the screenshot and forgot to mention it )

Oh yes, you’re right !

I should probably have formulated my thought differently.
What I meant was that if duplicates values are a possibility, tags are apparently not the way to go .

holroy · January 18, 2024, 5:19pm

Are the extra tags in unique files, or multiple equal tags in the same file?

The latter case we’ve already covered with dataview only using unique tags from any given file.

The former case however could possibly require you to change how you calculate the length before doing your average. As you would need to sum up the length of each of the groups in your query.

To verify whether this is the case, please include your current query, and answer the question above.

joaosousa1 · January 18, 2024, 6:10pm

I have both cases happening in my vault, sometimes I could have unique tags to multiple files and sometimes equal tags in one specific file. Something like that:

File 1

#parent/child/john/0
#parent/child/john/1
#parent/child/john/1
#parent/child/mary/2
#parent/child/mary/1

File 2

#parent/child/john/0
#parent/child/mary/2
#parent/child/mary/1

File 3

#parent/child/john/1
#parent/child/mary/1
#parent/child/mary/1

My current query is:

TABLE length(rows.value) as Amount, sum(rows.value)/length(rows.value) as Average
FROM #test 
FLATTEN file.etags as et 
FLATTEN list(split(et, "/")) as p 
FLATTEN number(p[3]) as value 
WHERE p[0] = "#parent" AND p[1] = "child"
GROUP BY p[2] as Person

I thought of implementing something like mary:: and john::, but I do not know if it would work the same way.

joaosousa1 · January 24, 2024, 12:58am

I have tried other options, but the one I shared is the best so far.

I also tried to get the last five for each person (mary, john), but I could not find a way to do that, is that possible?

holroy · January 24, 2024, 11:25am

I kind of had given up on getting the complete lists of tags, and was about to write a post on how you would need to change into using inline fields to be able to gather all the various tags. Then I read this post where they utilise app.metadataCache.getFileCache().tags, which was a function I was wondering about existed but hadn’t located yet.

I then proceeded to build some test files, and in the main file I had these queries:

---
Tags: f75210
---
questionUrl:: http://forum.obsidian.md/t//75210

```dataview
LIST file.etags
FROM #parent
```

```dataviewjs
const parentFiles = dv.pages("#parent")
  .map(p => { return {
    "file": p.file, 
    "etags": p.file.etags.filter(f => f.startsWith("#parent")),
    "allTags": null }
  })
  .mutate(p => {
    const tFile = app.vault.getAbstractFileByPath(p.file.path)
    const allTags = app.metadataCache.getFileCache(tFile)
      .tags
      .filter(f => f.tag.startsWith("#parent"))
      .map(m => m.tag)
    p.allTags = allTags
  })
  .map(p => [p.file.link, p.etags, p.allTags])

dv.table(["Note", "etags", "allTags"], parentFiles)
```

Which produces this output in my test vault:

Here we can see that the first naive approach using file.etags shows the unique tags, but the second output shows each occurrence of the tags present in that file (filtered for #parent... tags).

Based upon the latter query variation, it should be possible to achieve your goals of building a better average for the various cases. Will you be able to convert your existing script based upon this query?

A little explanation on the latter query. I start out by limiting our file set to those having any tags related to #parent (which happens to pick up all the nested tags as well). I then map the result to our specific needs mapping them to an object with the file, file.etags and allTags, where the latter is just a placeholder for stuff to be added later on.

In the mutate() below I then find the tFile for the file in question, and get the cache entries related to this, and filter out those related to #parent mapping only the tag to our temporary list (as we’re not interested in the positions as such). Then I do another map, just to make the output of the table even simpler… I’m a little lazy every now and then.

holroy · January 24, 2024, 11:46am

A bigger question though, am I able to leave this at that? And the answer to that is: Nope. So here is a query doing the averages for each person:

```dataviewjs

const parentFiles = dv.pages("#parent AND -#f66743")
  .map(p => { return {
    "file": p.file, 
    "allTags": null }
  })
  .mutate(p => {
    const tFile = app.vault.getAbstractFileByPath(p.file.path)
    const allTags = app.metadataCache.getFileCache(tFile)
      .tags
      .filter(f => f.tag.startsWith("#parent"))
      .map(m => m.tag)
    p.allTags = allTags
  })
  .allTags
  .map(t => {
    const tagParts = t.split("/")
    return [tagParts[2], tagParts[3]]
  })
  .groupBy(t => t[0])
  .map(t => {
    const sum = t.rows.values
      .reduce((tmp, row) => tmp + parseInt(row[1]), 0)
    return [ t.key, Math.round(sum/t.rows.length*100)/100 ]
  })

dv.table(["Parent", "Average"], parentFiles)
```

Here I’ve reduced my previous query, picked out just the allTags part, split each of the tags, and just keeping the person and amount part, before grouping on each person before finally summing and averaging over the amounts. Reducing my test set to match your original set we now get this result:

Which should correspond nicely to the original request of yours.

joaosousa1 · January 24, 2024, 9:22pm

Amazing! This really did what I needed, thanks.

By the way, I tried to sort using .sort(t => Math.round(sum/t.rows.length*100)/100), “asc”), but it did not work, it just worket when I used .sort(t =>t.rows.length “asc”)

Is there a reason for that?

joaosousa1 · January 24, 2024, 9:25pm

Besides that, is there a way to create another average column, but in this column it gets just the last 3 grades for each person?

I do not understand dataviewjs to try to do something like that.

holroy · January 25, 2024, 1:04pm

Yes!

HeHe… The reason is that we’ve now mapped the object into a list of values, so you don’t have access to the various bits and pieces any more. However, we do have access to the row, so you could add .sort(row => row[1] to sort on the average found in column 1 (aka the second column).

I’m only 99% sure this is the actual last three grades, but using slice(-3) gives us those three grades if the list is in the correct order. So here is a slight variation of the previous script where we also calculate the average for the (up to) last three grades of each person. This time around, I’ve chosen to sort on the third column, aka row[2]:


```dataviewjs

const parentFiles = dv.pages("#parent AND -#f66743")
  .flatMap(p => { 
    const tFile = app.vault.getAbstractFileByPath(p.file.path)
    const allTags = app.metadataCache.getFileCache(tFile)
      .tags
      .filter(f => f.tag.startsWith("#parent"))
      .map(m => m.tag)
    return allTags
  })
  .map(t => {
    //console.log(t)
    const tagParts = t.split("/")
    return [tagParts[2], tagParts[3]]
  })
  .groupBy(t => t[0])
  .map(t => {
    const sumTotal = t.rows.values
      .reduce((tmp, row) => tmp + parseInt(row[1]), 0)

    const lastThree = t.rows.values.slice(-3)
    const sumLastThree = t.rows.values.slice(-3)
      .reduce((tmp, row) => tmp + parseInt(row[1]), 0)
      
    return [ t.key, 
      Math.round(sumTotal/t.rows.length*100)/100 , 
      Math.round(sumLastThree/lastThree.length*100)/100 , 
    ]
  })
  .sort(row => row[2], 'desc') /* */

dv.table(["Parent", "Average", "Last 3"], parentFiles)
```

Which with my test data, where I added quentin with just one entry:

joaosousa1 · January 25, 2024, 2:38pm

That is great. Thank you for your time and patience.

system · February 1, 2024, 2:38pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.