"Group by" Woes

romebot · January 4, 2023, 2:54pm

Things I have tried

every code variation I could think of

What I’m trying to do

I have a very long list of people notes and a dataview table lists them alphabetically, which is great. But it would be nice to group them by occupation and then alphabetically under that. Not sure why it’s not working but here are some of the parameters of my people notes:

There is no “Name” in YAML because the note name is the name. If necessary I could add a “Name” field to all my notes but it seems redundant. Maybe there are more reasons to go back and add this. I would like to list the file name (or if not possible, the “Name” field) and use that as the link to that person’s note. I tried both with and w/o a Name field in YAML. Tried w/ and w/o ID.
What happens if someone has more than one occupation? E.g. Someone is a doctor and an author, etc? How does the grouping work and can that person be repeated in each group? In my case, however, the notes that have multiple Occupations in YAML dont show up as individual Occupations, but rather a long occupation with commas.

This is my current version of the code:

TABLE Name, Occupation, Employment, Born, Death, Birthplace, Note
FROM #note/type/person AND -"Templates"
GROUP by Occupation
SORT Name ASC

This is what my result looks like (occupation is in there 2x, there’s a blank line at the top, and there are no results in any of the other columns.

Thanks!

Craig · January 4, 2023, 7:51pm

There’s a lot going on here, and mastering it will require you to learn some of the ways in which Dataview works differently than a traditional database. In particular, it’s worth studying how GROUP BY and FLATTEN work. The docs provide some help, and there are some good posts on the forum that go into detail.

For the following examples I used the same input data:

We can visualize this data with a simple table:

```dataview
TABLE Name, Occupation, Born
```

One complication is that we have a field, Occupation, that can contain an array of values. It would be nice to be able to query the rows with the occupations split out so they’re easier to group. For this we can use FLATTEN.

FLATTEN is a way to create new columns based on existing ones. A key detail is that if the column being flattened is an array, then Dataview will create a new row for each value of the array. In this example, since a person can have more than one occupation, this creates a new row for each person’s occupation.

```dataview
TABLE Name, Occupation, Born
FLATTEN Occupation
```

Now it’s easier to group the rows by occupation, using the GROUP BY command.

However, unlike in regular relational SQL, GROUP BY does not change the layout of the table, but instead replaces the results with a new set of rows, one for each value of the group, each of which contains a single field, “rows”, which is an array of all the original rows in that group.

```dataview
TABLE rows.Name, rows.Occupation, rows.Born
FLATTEN Occupation
GROUP BY Occupation
```

This works, but may not be quite what we’re looking for, and comes with some other complications with regards to sorting each column.

One approach is to not use GROUP BY, but just to put the field in the first column and sort by it:

```dataview
TABLE WITHOUT ID
	Occupation,
	file.link as "Person",
	Born
FLATTEN Occupation
SORT Occupation, Name
```

The approach I usually use is to use a LIST instead of a TABLE, which organizes the data as an outline:

```dataview
LIST rows.Summary
FLATTEN Occupation
FLATTEN file.link + " (born " + Born + ")" as Summary
GROUP BY Occupation
```

I hope this is helpful as a starting point. Good luck!

romebot · January 4, 2023, 8:04pm

Amazing, I didn’t know about FLATTEN and was asking about that functionality in another post. I’ll have to play around with this and post back if something isn’t working. Thanks so much for your time, effort, and clarity!

romebot · January 4, 2023, 8:42pm

Hello again…

I tried a few things w/ a different scenario and maybe I’m running into something too complex for this, or maybe you know how if it can be done. I’m using inline tags to pull in quotes from various documents from various people. The thing is, when there’s more than one quote in a note, my dataview displays both quotes in the same column—which is hard to read.

So when I read about FLATTEN, I thought bingo, that could separate the quotes into their own rows. But then I have two sources too, and every combination I tried doesn’t really give me the result I want, which is separate rows for each quote and then sorting by the source in alphabetical order. Here is one code option:

TABLE WITHOUT ID Source, link(file.link, Title) AS "Note", Quote
FROM #pin/quote AND #pin/topic/god 
FLATTEN Source
SORT Source ASC

But that gives me this result—I now have separate sources, but now each source has both quotes in the row.

If I try this code:

TABLE WITHOUT ID Source, link(file.link, Title) AS "Note", Quote
FROM #pin/quote AND #pin/topic/god 
FLATTEN Quote
SORT Source ASC

I get an opposite problem-quotes are separated, but each has two sources:

I tried various other options based on your post, but couldn’t get it to work. Anything with GROUP by, resulted in empty fields and “row.” didn’t work.

What might you suggest?

Craig · January 4, 2023, 9:05pm

What happens if you flatten both fields?

TABLE WITHOUT ID Source, link(file.link, Title) AS "Note", Quote
FROM #pin/quote AND #pin/topic/god 
FLATTEN Source
FLATTEN Quote
SORT Source ASC

romebot · January 4, 2023, 9:58pm

I feel like I tried that and didn’t work, but now it does!!! maybe I typed something wrong. Now this is exciting. I will see if I can play w/ grouping now, but this, sorted by source is a really cool way to aggregate quotes or facts from across my database by topic(s). Much appreciated!

romebot · January 4, 2023, 10:01pm

Group by Source and Note give empty results for some reason. (see screeshot depicting both options)… Is there a way to group when doing this?

Craig · January 4, 2023, 10:18pm

Yes, however – remember that GROUP BY replaces the results with new rows. Each row of the group has a a field called rows that’s an array of original matching rows.

The reason your columns are empty is because GROUP BY replaced them; they’re now sub-columns (so to speak) within the new rows column.

To access the “sub-columns”, preface the field you want with rows..

Try something like this:

TABLE WITHOUT ID Source, rows.Note, rows.Quote
FROM #pin/quote AND #pin/topic/god 
FLATTEN Source
FLATTEN Quote
FLATTEN link(file.link, Title) AS Note
SORT Source ASC
GROUP BY Source

romebot · January 5, 2023, 8:18am

It’s a bit confusing, but starting to understand. I’ll have to study up on this.
That said, the code you wrote gives back a parsing error. Tried adding rows.Source and Putting Notes in quotes: FLATTEN link(file.link, Title) AS “Note” and a few other things, but couldn’t get it to work. This is what I get back from the original code you posted:

Dataview: Error:
– PARSING FAILED --------------------------------------------------

1 | TABLE WITHOUT ID Source, rows.Note, rows. Quote
| ^
2 | FROM #pin/quote AND #pin/topic/god
3 | FLATTEN Source

Expected:

variable identifier

Thanks so much for all your time and attention.

romebot · January 5, 2023, 12:10pm

Hey there and sorry for so many questions, but I made what I think is an identical setup to what we did before and that dataview page is great, but then I tried something on a new document where I’m both quoting from someone writing an article (see screenshots)… I’m taking full quotes of a whole block and also snippets where I find a term of art that i want to keep track of. The results are returning double listings or double author and I cant for the life of me figure out why. If I have “Source::” twice this happens, but I have that in the documents we did yesterday. Even if I change one of the sources to “XXX” it lists both.

p.s. dont mean to offend anyone politically, this is all just research and testing.

anon63144152 · January 5, 2023, 1:22pm

What a fantastically useful post. Thanks for all the explanations.

romebot · January 5, 2023, 3:23pm

Now these results are giving me doubles as well, so something went awry… the original documents for the god quotes are the same. pulling hair out.

Craig · January 5, 2023, 3:50pm

It looks like there’s a space between rows. and Quote, which is throwing off the parser.

romebot · January 5, 2023, 3:58pm

Good eye!! That fixed the parsing error, but now getting no results in the table.
Something’s gone awry as what was working yesterday is now showing double results and the new tests I did today also show double. I’m gonna try to start over in a new folder w/ new tags and see if I can figure out what’s wrong. Any ideas?

Craig · January 5, 2023, 4:02pm

Thanks for posting your work. I have to admit I’m getting lost in the details!

For debugging, it would be helpful if you could narrow down the number of files so we can see what’s going on. Could you perhaps post two short source pages along with a Dataview query that shows the double values from those pages? That would help a lot in debugging the issue.

romebot · January 5, 2023, 4:05pm

Sure!! Do you mean upload 2 text files or copy/paste?

Craig · January 5, 2023, 4:13pm

It would be helpful to me if you could copy/paste the source here of the shortest pages that show the problem, along with the Dataview query that’s not working right.

In my field we call that the Minimum Reproducable Example.

romebot · January 5, 2023, 4:19pm

great! thanks! So this first post is the problem we solved yesterday… I simplified it to one note with two quotes on it, and two dataview snippets, one with and one w/o the grouping. I’ll work on the thing from today to simplify it and post it in a bit.

Here is the note:

Quote:: “I’m not so hot on apples.” — [[Kaufmann, Lisa|Lisa Kaufmann]]

#pin/quote
Source:: [[Kaufmann, Lisa]]
Topic:: #pin/topic/apples

Quote:: “Apples make the world go round.” — [[Sagan, Carl|Carl Sagan]]

#pin/quote
Source:: [[Sagan, Carl]]
Topic:: #pin/topic/apples

And here are the dataview snippets

No Grouping - Duplicate Entries for any note with 2 Quotes.

TABLE WITHOUT ID Source, link(file.link, Title) AS "Note", Quote
FROM #pin/quote AND #pin/topic/apples
FLATTEN Source
FLATTEN Quote
SORT Source ASC

Grouping - No Results

TABLE WITHOUT ID Source, rows.Note, rows.Quote
FROM #pin/quote AND #pin/topic/god 
FLATTEN Source
FLATTEN Quote
FLATTEN link(file.link, Title) AS Note
SORT Source ASC
GROUP BY Source

holroy · January 5, 2023, 4:43pm

As long as you don’t use lists/task, these two quotes will be within the page enclosure, and the fields you set will be associated with the page, making both Kaufman and Sagan the source of either quote.

Sure, we can split/flatten/… them, but your base data is just wrong. You’ll never be able to associate one of those sources with just one quote, unless you change your markup.

One way to do that would be to utilise lists, maybe something like

- "(quote:: I'm not so hot on apples)", (source:: Kaufmann, Lisa), (topic::#pin/topic/apples)
- "(quote:: Apples make the world round)", (source:: Sagan, Carl), (topic::#pin/topic/apples)

Then you could do queries like this:

```dataview
table without id
  item.quote as Text,
  item.source as Source,
  item.topic as Topic
flatten file.lists as item
where file.name = this.file.name
```

Which combined produces output like:

And if you removed some of the styling related to inline fields, it could look like:

Or if not using the hidden inline field syntax, aka exchanging the ( ... ) with [ ... ]:

Summary: As long as you don’t use lists (or tasks), your fields stays connected to the page, and not the various quotes, using lists makes your life a whole lot easier, as you can then use one simple FLATTEN and treat each quote individually. And these can be styled a multitude of ways, to make them look nicer.

romebot · January 5, 2023, 5:07pm

Here is my 2nd example with inline pull quotes. Most articles where I tag “Terms of Art” are usually by the author of the article—In this example, Tom Nichols—so I dont want to have to put “Source::” under every block, as there can me many. In any case, when “Source::” is in the same document twice, the results get skewed. There might also be times when a “Term of Art” happens to be from a different person.

Additionally, there might also be full quotes I want to pull from the article by the same author, and sometimes quotes by others. But again, when there is more than one “Source::”, dataview gets confused. Even if the source for the bottom quote is also Tom Nichols, just having “Source::” twice makes the author duplicate in the dataview results.

This is the main note:

#pin/term-of-art/political-test

Quote:: “Quote1. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.”" — [[Nichols, Tom|Tom Nichols]]

#pin/quote-test
Source:: [[Nichols, Tom]]
Topic:: #topic/test

[pt:: “Term of Art1. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. ==Ut enim ad minim veniam==, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.”]

[pt:: “Term of Art2. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do ==eiusmod tempor== incididunt ut labore et dolore magna aliqua.”]

Quote:: “Quote2. Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.” — [[Kaufmann, Lisa|Lisa Kaufmann]]

#pin/quote-test
Source:: [[Kaufmann, Lisa]]
Topic:: #topic/test

And here are my two dataview snippets:

Political Terms of Art - In this example, they should be only from Tom Nichols who wrote the article.

TABLE without ID pt AS "Political Term of Art", Source AS "Author", link(file.link) AS "Note"
FROM #pin/term-of-art/political-test
FLATTEN pt
FLATTEN Author
FLATTEN Quote
SORT Note ASC

Quotes - These should be from the correct respective reference.

TABLE without ID Quote AS "Quote", Source AS "Author", link(file.link) AS "Note"
FROM #pin/term-of-art/political-test
FLATTEN Author
FLATTEN Quote
SORT Note ASC