Nested tags in combination with contains & co

holroy · January 10, 2023, 6:44pm

tl;dr: Summary of the summary

Here comes a very long post, with examples on how the contains(), icontains() and econtains() of contains and friends from Dataview work together. I think we somewhat carelessly use the contains() way to much in combination with tags, and should use either of the two following for most of tag matches.

Use econtains(file.etags, "#some/tag") for exact matches on either a normal tag, or a nested tags. No tags on lower levels will be included. This will also handle if more tags are added to notes at a later stage.
Use econtains(file.tags, "#some/tag") for a complete match on the entire tag, and nested tags on lower levels. This also avoids mismatches where the listed value could be a part of the tags. Especially dangerous when the hashtag is omitted (and one uses contains())

A pre-requisite for this post

For this post to work, if one want to follow along in your home vault, you need four files based upon this pattern:

Filename: `Tagged_AABBCC``

---
tags: f51705, aa/bb/cc
---

The four files needed with that last tag specified:

Tagged_AABBCC with aa/bb/cc
Tagged_AABB with aa/bb
Tagged_AABBBB with aa/bbbb
Tagged_DDAABB with dd/aa/bb

Do remember to include the f51705, as well, as that is used to limit to only the test set in the queries.

The raw text for the rest of this post

In the block below is the raw text, with the original queries used to produce the images in this post.

### Difference between file.tags and file.etags

The data set used for this post can be described by the following table, which also shows the difference between `file.tags` and `file.etags`. 

```dataview
TABLE file.etags, file.tags
FROM #f51705
```


`file.etags` contains _only_ the **exact** nested tags as written in the file, whilst `file.tags` contains every tag, and every level of the nested tags. Come back to this table if you're unsure what values in the different tags it will check against.

So for each of these files I've used a frontmatter with the values in `file.etags`, formatted like `tags: f51705, aa/bb`, where the latter part changes according similar to the file name. Notice there are no hashtag in this defintion.

## Using FROM 
### FROM is a prefix of the tags
```dataview
TABLE "FROM aa/bb"
FROM #aa/bb AND #f51705
```

See how it include both `Tagged_AABB` with #aa/bb , and `Tagged_AABBCC` with #aa/bb/cc files, whilst still ignoring `Tagged_AABBBB` with #aa/bbbb ?

### FROM with a negation
```dataview
TABLE "FROM aa/bb AND -#aa/bb/cc"
FROM #aa/bb AND -#aa/bb/cc 
```

This will include every file with a tag starting with #aa/bb , but leave out all the files with a tag starting with #aa/bb/cc 

---

## file.tags - Tags are split
### Using contains on file.tags
```dataview
TABLE
  contains(file.tags, "#aa/bb/cc"),
  contains(file.tags, "#aa/bb"),
  contains(file.tags, "aa/bb"),
  contains(file.tags, "AA/bB/Cc")
FROM #f51705
```

There is a lot of matches in this table, denoting that care should really be taken when matching using `file.contains`. Of course, these tags are designed a little to trigger false positives, but I hope I'm also illustrating how one should thread carefully when building these tests.

Especially the second case of testing towards #aa/bb will allow for the tag to be longer,  as shown for `Tagged_AABBBB` file with #aa/bbbb. This can be a wanted side effect, or in some case just dead wrong. Also notice that in the last case nothing matches, as `contains()` does case-sensitive matches!

### Using icontains on file.tags
```dataview
TABLE
  icontains(file.tags, "#aa/bb/cc"),
  icontains(file.tags, "#aa/bb"),
  icontains(file.tags, "aa/bb"),
  icontains(file.tags, "AA/bB/Cc")
FROM #f51705
```

Very similar to the previous run, but since we're now using `icontains()` it does case insensitive matchin, so that the last colums is all true, for all three files.

### Using econtains on file.tags
```dataview
TABLE
  econtains(file.tags, "#aa/bb/cc"),
  econtains(file.tags, "#aa/bb"),
  econtains(file.tags, "aa/bb"),
  econtains(file.tags, "AA/bB/Cc")
FROM #f51705
```

The `econtains()` variant matches on the entirety of the values provided, since we're still using `file.tags` this means that for #aa/bb/cc  it'll perform checks for #aa and #aa/bb and  #aa/bb/cc and so on.

To use this one therefore has to include the hashtag in front, as in the two first columns. Neither of the two last columns will ever match, since they don't have the hashtag. 

The only truthy values here are therefore for the `Tagged_AABB` and `Tagged_AABBCC` files, which are the only files _actually starting with #aa/bb,_. The two other files either has something else at the start, #dd/aa/bb/cc, or at the end, #aa/bbbb .

In other words, if one want to check if a tag consistently starts with something, it's best to use `econtains(file.tags, "#.../...")`, and this will include deeper nested tags. Like shown for the #aa/bb search.

---

## file.etags  - Full tags

### Using contains on file.etags
```dataview
TABLE
  contains(file.etags, "#aa/bb/cc"),
  contains(file.etags, "#aa/bb"),
  contains(file.etags, "aa/bb"),
  contains(file.etags, "AA/bB/Cc")
FROM #f51705
```
Using contains on `file.tags` and `file.etags` yields the same result, as the full tag is included in both variants. So this is identical to the one matching using `file.tags`.

### Using icontains on file.etags
```dataview
TABLE
  icontains(file.etags, "#aa/bb/cc"),
  icontains(file.etags, "#aa/bb"),
  icontains(file.etags, "aa/bb"),
  icontains(file.etags, "AA/bB/Cc")
FROM #f51705
```

Just as for `contains()`, using icontains on `file.tags` and `file.etags` yields the same result, as the full tag is included in both variants. So this is also identical to the one matching using `file.tags`.

### Using econtains on file.etags
```dataview
TABLE
  econtains(file.etags, "#aa/bb/cc"),
  econtains(file.etags, "#aa/bb"),
  econtains(file.etags, "aa/bb")
FROM #f51705
```

This is the where the output really differs, as since we now _only_ have the full tags available, no sub-matching occurs on the shallower parts of nested tags. The only truthy value we get is for `Tagged_AABBCC`  with #aa/bb/cc  and `Tagged_AABB` with #aa/bb. No other combinations matches, as the either lack the hashtag or are missing levels or has additional text on the various levels.

In other words, if one want to check for an absolute equality on the tag, one needs to use `econtains(file.tags, "#.../...")`. This will not include any tags which are nested deeper, just the exact match.

## Summary of useful matches

The main idea behind this note so far has been to showcase the various variants of contains, and show the wide range of testing we can do. Now I want to narrow it down to some specific cases.

## Check for a single tag value

Even though it's somewhat common to see stuff like `task.tags = file.tags` or  `file.tags = "..."`, these are very susceptible for errors later on when someone adds another tag to that file, and cause a multitude of queries to fail.

The safe option would be to do: `econtains(file.etags, "#f51705")`.
Sadly tasks don't have `etags`, so they need to rely on `econtains(task.tag, ...)`,  or some combination with `filter`/`map` which is out of the scope for this post.

## What does FROM use?

It's seem based upon my tests here, and other experience that `FROM` uses the `econtains(file.tags, ... )` variant. You need to specify the entire tag, and in the case of nested tags it needs to be a complete sublevel, so you can't use `FROM #aa/bb` to match something tagged with #aa/bbbb . 

But `FROM` does allow tags which are nested deeper, so hence I think it uses `file.tags` (and not `file.etags`).

Put another way, I think these are equal:
- `FROM #aa/bb AND -#aa/bb/cc`
- `WHERE econtains(file.tags, "#aa/bb") AND !contains(file.tags, "#aa/bb/cc")`. 

When to use what is left up to the readers discretion.
##  Getting multiple values out of nested tags

If one wants to get all those working in #company/google, and don't care which departure, or every person type note, #type/person, no matter other characteristics, or in my case everything related to december of last year, #year/2022/12 and include days and so on I would recommend using the `econtains(file.tags, "#...")`-variant.

But if you just want those with that exact tag, and nothing on any deeper levels, then you should use the `econtains(file.etags, "#...")`-variant. These are a lot safer than `contains()` and you'll avoid false positive. 

## Getting random parts of tags

If one have parts of a tag, or know that the part can exist on various levels of a nested tag, and want to find tags like that, one would need to open the search again. Like if you wanted to get stuff from #company/google/person and #type/person/page you'd use a very lenient search such `contains(file.tags, "person")`, but do be aware that this will also include tags like #personality or #impersonate.

Yet again, if one would like to check for a nested tag having a given sublevel, it'll get a little tricky as one would need to do something along the following lines:

```dataview
TABLE etag, level 
FROM #f51705 
FLATTEN substring(file.etags, 1) as etag
FLATTEN split(etag, "/") as level
WHERE level = "bb"
```

Can be shortened using some combination of `filter`/`map`/..., most likely. And of course, you don't need to actually output the values of `etag` and `level`.

Query for most of contains and friends queries

In the section above every query is repeated, in this post I mostly most the pictures, but the header shows the function used to compute that column.

So every of these (without queries below) follow this template:

```dataview
TABLE
  contains(file.tags, "#aa/bb/cc"),
  contains(file.tags, "#aa/bb"),
  contains(file.tags, "aa/bb"),
  contains(file.tags, "AA/bB/Cc")
FROM #f51705
```

And the output is simply true if it matches, and false if it doesn’t match. The absolute first column shows which test file this line is for, and due to the filename you can remember which tag it has.

Difference between file.tags and file.etags

The data set used for this post can be described by the following table, which also shows the difference between file.tags and file.etags.

```dataview
TABLE file.etags, file.tags
FROM #f51705
```

file.etags contains only the exact nested tags as written in the file, whilst file.tags contains every tag, and every level of the nested tags. Come back to this table if you’re unsure what values in the different tags it will check against.

So for each of these files I’ve used a frontmatter with the values in file.etags, formatted like tags: f51705, aa/bb, where the latter part changes according similar to the file name. Notice there are no hashtag in this defintion.

Using FROM

FROM is a prefix of the tags

```dataview
TABLE "FROM aa/bb"
FROM #aa/bb AND #f51705
```

See how it includes’ both Tagged_AABB with #aa/bb , and Tagged_AABBCC with #aa/bb/cc files, whilst still ignoring Tagged_AABBBB with #aa/bbbb ?

FROM with a negation

```dataview
TABLE "FROM aa/bb -#aa/bb/cc"
FROM #aa/bb AND -#aa/bb/cc 
```

This will include every file with a tag starting with #aa/bb , but leave out all the files with a tag starting with #aa/bb/cc

file.tags - Tags are split

Using contains on file.tags

There is a lot of matches in this table, denoting that care should really be taken when matching using file.contains. Of course, these tags are designed a little to trigger false positives, but I hope I’m also illustrating how one should thread carefully when building these tests.

Especially the second case of testing towards #aa/bb will allow for the tag to be longer, as shown for Tagged_AABBBB file with #aa/bbbb. This can be a wanted side effect, or in some case just dead wrong. Also notice that in the last case nothing matches, as contains() does case-sensitive matches!

Using icontains on file.tags

Very similar to the previous run, but since we’re now using icontains() it does case insensitive matchin, so that the last colums is all true, for all three files.

Using econtains on file.tags

The econtains() variant matches on the entirety of the values provided, since we’re still using file.tags this means that for #aa/bb/cc it’ll perform checks for #aa and #aa/bb and #aa/bb/cc and so on.

To use this one therefore has to include the hashtag in front, as in the two first columns. Neither of the two last columns will ever match, since they don’t have the hashtag.

The only truthy values here are therefore for the Tagged_AABB and Tagged_AABBCC files, which are the only files actually starting with #aa/bb,. The two other files either has something else at the start, #dd/aa/bb/cc, or at the end, #aa/bbbb .

In other words, if one want to check if a tag consistently starts with something, it’s best to use econtains(file.tags, "#.../..."), and this will include deeper nested tags. Like shown for the #aa/bb search.

file.etags - Full tags

Using contains on file.etags

Using contains on file.tags and file.etags yields the same result, as the full tag is included in both variants. So this is identical to the one matching using file.tags.

Using icontains on file.etags

Just as for contains(), using icontains on file.tags and file.etags yields the same result, as the full tag is included in both variants. So this is also identical to the one matching using file.tags.

Using econtains on file.etags

This is the where the output really differs, as since we now only have the full tags available, no sub-matching occurs on the shallower parts of nested tags. The only truthy value we get is for Tagged_AABBCC with #aa/bb/cc and Tagged_AABB with #aa/bb. No other combinations matches, as the either lack the hashtag or are missing levels or has additional text on the various levels.

In other words, if one want to check for an absolute equality on the tag, one needs to use econtains(file.tags, "#.../..."). This will not include any tags which are nested deeper, just the exact match.

Summary of useful matches

The main idea behind this note so far has been to showcase the various variants of contains, and show the wide range of testing we can do. Now I want to narrow it down to some specific cases.

Check for a single tag value

Even though it’s somewhat common to see stuff like task.tags = file.tags or file.tags = "...", these are very susceptible for errors later on when someone adds another tag to that file, and cause a multitude of queries to fail.

The safe option would be to do: econtains(file.etags, "#f51705").
Sadly tasks don’t have etags, so they need to rely on econtains(task.tag, ...), or some combination with filter/map which is out of the scope for this post.

What does FROM use?

It’s seem based upon my tests here, and other experience that FROM uses the econtains(file.tags, ... ) variant. You need to specify the entire tag, and in the case of nested tags it needs to be a complete sublevel, so you can’t use FROM #aa/bb to match something tagged with #aa/bbbb .

But FROM does allow tags which are nested deeper, so hence I think it uses file.tags (and not file.etags).

Put another way, I think these are equal:

FROM #aa/bb AND -#aa/bb/cc
WHERE econtains(file.tags, "#aa/bb") AND !contains(file.tags, "#aa/bb/cc").

When to use what is left up to the readers discretion.

Getting multiple values out of nested tags

If one wants to get all those working in #company/google, and don’t care which departure, or every person type note, #type/person, no matter other characteristics, or in my case everything related to december of last year, #year/2022/12 and include days and so on I would recommend using the econtains(file.tags, "#...")-variant.

But if you just want those with that exact tag, and nothing on any deeper levels, then you should use the econtains(file.etags, "#...")-variant. These are a lot safer than contains() and you’ll avoid false positive.

Getting random parts of tags

If one have parts of a tag, or know that the part can exist on various levels of a nested tag, and want to find tags like that, one would need to open the search again. Like if you wanted to get stuff from #company/google/person and #type/person/page you’d use a very lenient search such contains(file.tags, "person"), but do be aware that this will also include tags like #personality or #impersonate.

Yet again, if one would like to check for a nested tag having a given sublevel, it’ll get a little tricky as one would need to do something along the following lines:

```dataview
TABLE etag, level 
FROM #f51705 
FLATTEN substring(file.etags, 1) as etag
FLATTEN split(etag, "/") as level
WHERE level = "bb"
```

Can be shortened using some combination of filter/map/…, most likely. And of course, you don’t need to actually output the values of etag and level.

oirammui · June 1, 2023, 4:54am

they closed the subtags page so i couldn’t offer a solution for people that wanted to see only the top tag and not the subtags

table file.tags, hasSubtags
from #writing
FLATTEN any(
    filter(
        file.tags, (x) => startswith(x,"#writing/")
    )
) as hasSubtags
WHERE hasSubtags = false
LIMIT 33

This gets all the top tags. I put in in a tag page