Dataview problem : trying to merge data as I would do with an INNER JOIN in SQL

Hello everybody !

Using Obsidian for some time now, I now want to try more sophisticated tricks. In this case, it is work related and has to do with business and enterprise architecture. So I’ll have to explain a bit.

1. Data structure

I have basically put the HERM model (Higher Education Reference Model) in Obsidian. It is a tree structure, with 4 different levels. The two first are basically only containers with no interesting properties. The third has more properties, but I won’t deal with them for now. So I just focus on the fourth level items.

1st level 2nd level 3rd level 4th level
Recherche
Réaliser
BC074 Mise en œuvre de la recherche
BC075 Création de recherche
BC077 Gestion des données de la recherche
BC097 Gestion des infrastructures de recherche
BC236 Gestion des ressources de la recherche
BC093 Gestion de la recherche
BC069 Gestion des programmes de recherche
BC073 Gestion de la recherche de fonds

Each one of these items, extracted from the HERM, are in the form of a Note within a folder with the same name (because I use the Folder Notes component). And it is made solely for better reading purposes. Each one of the Note reflecting an item has also a property with its level.

Now, a fourth level item would look like this :

BC202 Alignement, planification et organisation (note name)
---
TypeObjetHERM: Capacité métier
NiveauObjetHERM: 4
ResponsableMétierCapacitéMétier:
- "[[TINTIN]]"
CapacitéEnUsage: true
CapacitéCritique: false
ServicesApplicatifsLiés:
tags:
- HERM
---

What I plan people in my organization to do would be to rate the business capabilities, on 4 different dimensions (organization, processes, solutions, global), with a grade scale from 1 to 4 (integer only).
But, you can have multiple evaluations for the same item, but with different roles (Direction or Responsable), on different dates. The purpose is to have a permanent evaluation on our business capabilities to drive our strategy, or at least, to help us targeting the weakest spots in this map, to target coontinuous improvement.

The proper process for one business capability evaluation would be :

  • In parallel : notations from the Head of our school, and the one person in charge of the capability
  • Discussion to reduce the gap in knowledge and appreciation of all 4 dimensions on this capability
  • Negotiation to provide a final evaluation
  • Some time later, rinse and repeat with the same steps

To stick with that philosophy, I therefore built another structure of Notes to handle the evaluation. Another Folders/Notes structure, with one folder containing this time multiple evaluations, and the evaluation Notes contained being like this :

BC202 - AKS - 20241201 (note name)
---
Évaluateur: "[[René DESCARTES]]"
DateÉvaluation: 2024-12-01
TypeÉvaluation: Direction
CapacitéÉvaluée: "[[BC202 Alignement, planification et organisation]]"
NoteOrga: 2
NoteProcessus: 3
NoteSolutions: 1
NoteGlobale: 2
tags:
  - évaluation
---

Obvisously, I have all grades (the NoteSomethings), a date, a type (Direction, Responsable or Finale), and a person related to the evaluation (in case of Direction or Responsable, not Finale). And a tag to help gathering info later on. The obvious thing is that I link the evaluation to the Note of the evaluated Business Capacity (CapacitéÉvaluée).

In this case, I could have also these notes in other folders :

---
Évaluateur: "[[Jean-Jacques ROUSSEAU]]"
DateÉvaluation: 2024-12-05
TypeÉvaluation: Accountable
CapacitéÉvaluée: "[[BC202 Alignement, planification et organisation]]"
NoteOrga: 3
NoteProcessus: 4
NoteSolutions: 2
NoteGlobale: 3
tags:
  - évaluation
---

AND

---
Évaluateur: 
DateÉvaluation: 2024-12-31
TypeÉvaluation: Finale
CapacitéÉvaluée: "[[BC202 Alignement, planification et organisation]]"
NoteOrga: 3
NoteProcessus: 3
NoteSolutions: 2
NoteGlobale: 2
tags:
  - évaluation
---

2. What I could achieve

With this data structure and my personal knowledge of using Dataview in Obsidian, I could build at least a summary of all evaluations for one Business Capability inside the Note of itself. Like this :

dataview
TABLE NoteOrga, NoteProcessus, NoteSolutions, NoteGlobale, Évaluateur, DateÉvaluation, TypeÉvaluation
FROM "Évaluations"
WHERE contains(file.outlinks, this.file.link)
SORT DateÉvaluation ASC

This helps already to compare evaluations between multiple shareholders, and also to have perspective over time on how the capability is evolving.
Side note : all evaluations are in the folder structure starting with a root folder name “Évaluations”, but I could also use the #évaluation instead

3. What I would like to achieve

Given the data structure described above, what I miss now is the possibility to have in just one place a super table listing all my Business Capacities (only on the 4th level) with their latest finale grades.

The final result would look like somthing like :

file.name NoteOrga NoteProcessus NoteSolutions NoteGlobale DateÉvaluation
BC202 Alignement, planification et organisation 3 3 2 2 décembre 31, 2024
BC203 Construction, acquisition et implémentation 3 3 3 3 décembre 31, 2024
BC204 Livraison, service et support 1 1 1 1 décembre 31, 2024
BC205 Suivi évaluation et contrôle 2 3 3 3 décembre 31, 2024
BC205 Suivi évaluation et contrôle 2 3 3 3 décembre 31, 2024

With my dummy data sets, for now, with just one final evaluation, I could gather only this type and it will work. But when I will have multiple final evaluations over the same items, I quite don’t know how to create the proper dataview.
The case of an item not being evaluated (with a Finale evaluation type) would raise occasionally as well.

I used to be a developper, like a thousand years ago, so I feel stupid on these. I have tried to dig into similar issues, and I have the feeling that I probably would have to use the FLATTEN and GROUP BY arguments in the query, but honestly, I do not understand well these concepts.
Examples of threads I tried and failed to understand :
https://forum.obsidian.md/t/dataview-query-using-where-and-flatten/54303
https://forum.obsidian.md/t/dataview-join-tables-and-limit-output-of-ungenerated-notes/50049
https://forum.obsidian.md/t/left-join-right-join-inner-join/53940

So if anyone (maybe holroy ? as he seems to be the most active and proficient on these) could have a look on this and help me with the proper code, it would be great ! And if there is enough explanation to make me more autonomous, it will be even better :smile:

Thanks !

If I understand you correctly you want to look at all the notes your “Évaluations” folder, focusing on those having TypeÉvaluation: Finale, and I reckon you want the average of the values for any given CapacitéÉvaluée. You could also do sum(), max(), min()

Untested, but shouldn’t something like this then work:

```dataview
TABLE average(rows.NoteOrga) as Orga, average(rows.NoteProcessus) as Processus
FROM "Évaluations"
WHERE TypeÉvaluation = "Finale"
GROUP BY CapacitéÉvaluée
```

Why do you think you need to do some inner join or similar? Am I misunderstanding something?

The query could possibly fail due to hat property being a link, but then you could try using string(CapacitéÉvaluée) instead. (You could also use some variant of meta() on the property, and picking the path or similar. ) But I believe it should work with the link as I stated in the query already.


The trick when using GROUP BY is that it’ll collate all the rows where the expression are the same (saved into a key field) into different lists for each unique value. And what was a given row (or note) field/properties are stored within the rows object. So that if you had two notes both having the BC202 as the CapacitéÉvaluée you’d get one row where the key was that BC202 link, and the rows list would hold two elements which was the data from the two notes.

1 Like

Hello @holroy !
Thanks for your time and your answer :slight_smile:

Your code works well, but not for the intended goal I had.
Well, for the part of having only Business Capabilities as the main column, yes.

But what I was looking for is more complex than having the average of each note : I would really like having the latest (in terms of date) finale grade for each dimension, rather than having the average.

And again, what could be tricky is that I will have multiple “Finale” evaluations over time (let’s say it will be a 2-year cycle for a complete or partial rerun of the evaluations), and maybe, sometimes, only partial evaluations (not all capabilities evaluated each time).

So for one Business Capability I could have (with only Final evaluations) :

Évaluation Business Capability Note Orga Note Processus Note Solutions Note Globale Date évaluation
BC202 - Finale - 20231231 BC202 Alignement, planification et organisation 3 2 2 2 décembre, 31, 2023
BC202 - Finale - 20241231 BC202 Alignement, planification et organisation 3 3 3 3 décembre, 31, 2024
BC202 - Finale - 20251231 BC202 Alignement, planification et organisation 2 2 3 2 décembre, 31, 2025

And for another Business Capability

Évaluation Business Capability Note Orga Note Processus Note Solutions Note Globale Date évaluation
BC202 - Finale - 20231231 BC156 Gestion des contrats 1 1 1 1 décembre, 31, 2023
BC202 - Finale - 20241231 BC156 Gestion des contrats 3 2 2 2 décembre, 31, 2024

In that case, I would really like having the following results for my previous dataset :

Business Capability Note Orga Note Processus Note Solutions Note Globale Date évaluation
BC202 Alignement, planification et organisation 2 2 3 2 décembre, 31, 2025
BC156 Gestion des contrats 3 2 2 2 décembre, 31, 2024

So, each Business Capability on a row (obtained through the Group by), but with the grade sets which belongs to the latest date, when existing … and that’s where I don’t know how to do that.

Moreover, where can I find, for example, the ressources on available functions I can apply to the rows of my request. From your proposition of the average, I deducted I could use min or max (for other purposes), but where can I find all existing functions ? I just don’t know where to look for those.

The trick to doing this is to sort the overall list before grouping, and then picking out just one of the rows for display. Note that you can still access the overall grouped result from the rows to do other stuff. This allows for both displaying the latest evaluation like you’re asking for, but you could also check for sums, averages, and so on on the rest of the data set.

Here are two queries to showcase this idea:

## All the relevant evaluations (sorted by date)

```dataview
TABLE NoteOrga, NoteProcessus, NoteSolutions, NoteGlobale, DateÉvaluation
FROM "Évaluations"
WHERE TypeÉvaluation = "Finale"
SORT DateÉvaluation
```

## Only the last evaluations 

```dataview
TABLE length(rows) as TotalCount, lastEval.NoteOrga, lastEval.NoteProcessus, lastEval.NoteSolutions, lastEval.NoteGlobale, lastEval.DateÉvaluation
FROM "Évaluations"
WHERE TypeÉvaluation = "Finale"
SORT DateÉvaluation
GROUP BY CapacitéÉvaluée
FLATTEN slice(rows, -1) as lastEval
```

On my test setup this displays as:

Notice that I’ve added the total number of evaluation available in the second column, just as a proof of concept.

The main documentation site for Dataview is here: Functions - Dataview

Where I’ve focused in on the numeric operations which are what we’re using here. The site has loads of documentation, although there are not a lot of examples on how to use it. But it’s a starting point, and can give you ideas which you can try out, and come back here (or to the Discord) and ask questions to further help your understanding.

2 Likes

@holroy , my man, you’re a god !

That was exactly what I was looking for with your second proposition.

There is still so magic (i.e. parts that I do not understand) in your dataview request, but really, a huge THANK YOU !

And yes, I really need to dig more into the sources you pointed out. :slight_smile:

1 Like