Automating Timesheet Help - Query embedded fields and sum hours across all files in current folder

Josh_Lawrence · November 13, 2023, 7:41pm

Hello, I’m trying to automate tallying hours worked across project codes.

My Current Workflow

Currently everyday I open a daily note entitled by the days date and I organize my notes by work week folders. For example:

1. Week of May 29th
- 2023-05-29
- 2023-05-30
- etc…
1. Week of June 5th
- 2023-06-05
- 2023-06-06
- etc…

In each of my daily notes I have a list of items which outline what I did and the project code allocated to that task. I’ve added an embedded field for the purpose of tallying hours later. The value of the field corresponds to the amount of hours worked towards that project code.

For example here are example contents for a few files:

/22. Week of May 29th/2023-05-29:

#projectcode1 Completed development task [projectcode1::3.5]
- Random notes I type related to this work activity. This can be ignored in my obsidian processing and is just for personal reference
#projectcode2 Meeting regarding scope [projectcode2::1]
- More random notes taken during this meeting
#projectcode1 Meeting regarding task 179 [projectcode1::0.5]
#projectcode3 New Project meeting [projectcode3::1]

/22. Week of May 29th/2023-05-30:

#projectcode1 Completed development task [projectcode1::3.5]
- Similarly I’ll have random notes nested under the top level list which can be ignored
#projectcode3 Meeting regarding scope [projectcode2::1]
#projectcode1 Meeting regarding task 179 [projectcode1::0.5]
#projectcode4 New Project meeting [projectcode3::1]

What I’m trying to do

I’m trying to create a template file which I can copy into each work week directory which will parse all the files in the current folder (except this file) and produce aggregate information for me to type into my hour logging system.

Ultimately I’d like to be able to produce the following objectives, all can be separate dataview commands:

List all the unique project codes which have been written within the current week. For example if I were to insert this template document in the folder /22. Week of May 29th it should generate the following list:
- projectcode1
- projectcode2
- projectcode3
- projectcode4
List all lines of text in all files in the current directory which contain the project code tag and embedded field and group the output by file name. The purpose of this is to list all high level task descriptions to write into an hour logging system. It should produce the following:
- 2023-05-29
  - #projectcode1 Completed development task [projectcode1::3.5]
  - #projectcode2 Meeting regarding scope [projectcode2::1]
  - #projectcode1 Meeting regarding task 179 [projectcode1::0.5]
  - #projectcode3 New Project meeting [projectcode3::1]
- 2023-05-30
  - #projectcode1 Completed development task [projectcode1::3.5]
  - #projectcode3 Meeting regarding scope [projectcode2::1]
  - #projectcode1 Meeting regarding task 179 [projectcode1::0.5]
  - #projectcode4 New Project meeting [projectcode3::1]
List the sum of hours worked across each unique project code as well as a total number of hours worked. The output should produce the following (formatting can differ):
- Project Code, Hours
- projectcode1: 8
- projectcode2: 1
- projectcode3: 2
- projectcode4: 1
- Total: 12

Things I have tried

As I’ve outlined three separate objectives I’ll share what I’ve done for each by the objective number in the list above:

List of unique project codes
I’ve tried the following, though in each case it produces lists of unique job codes per file grouping them by file when I don’t want duplicates. Otherwise it does extract the embedded tags I want it just does not product a unique list across all files in the folder.

TABLE without id file.etags AS "Job Codes"
from "Daily Notes/2023/22. Week of May 29th"

TABLE file.etags AS "Job Codes"
from "Daily Notes/2023/22. Week of May 29th"

List all Lines which contain
I’ve been able to implement this feature, thought it would be the most difficult but it ended up being alright, heres the solution:

TABLE L.text AS "My Lists"
FROM "Daily Notes/2023/22. Week of May 29th"
FLATTEN file.lists As L
WHERE contains(L.tags, "#") 
SORT file.cday ASC

List of sum of hours across each unique project code
I’ve tried something like the following which has yielded okay results but I’d like to generalize it so I don’t have to change the dataview command when the project codes change week to week. Ultimately it would be ideal to take the result from Objective 1 as a list of project codes which programmatically generates this table with those codes.

TABLE sum(projectcode1) as pc1, sum(projectcode2) as pc2, sum(projectcode3) as pc3, sum(projectcode4) as pc4, sum([sum(projectcode1), sum(projectcode2), sum(projectcode3), sum(projectcode4)]) as Total
from "Daily Notes/2023/22. Week of May 29th"

Thanks so much in advance. I look forward to suggestions on improving my question and resolve it!

holroy · November 14, 2023, 3:25pm

A general tip regarding queries is not to build data structure where the key varies. With this I mean doing stuff like [projectcode1:: ... ] and [projectcode2:: ... ], but rather use [project:: code1] and [project:: code2]. This makes your query a lot easier and more flexible and maintainable.

Along this with, if you’re going to vary what you sum across, like only one project or all projects, you need to use a given property to record the hours. Using both of these tips, I would change the markup for your work session to become of this format:

- (project:: code1) Completed development task [hours::3.5]

Using ( ... ) if you want to hide the key, and [ ... ] if you want the key to show by default. The latter variant also allows for more customisation using CSS, as you then can target a particular set of inline fields, (like hours if you wanted to add h after the value).

Given this kind of syntax, queries to answer the two first issues of yours:

## 1) All unique project codes
```dataview
LIST
FROM "ForumStuff/f71/f71232"
FLATTEN file.lists as item
WHERE item.project
GROUP BY item.project
```

## 2) Text and sum hours worked on each project
```dataview
TABLE rows.cleanText as Tasks, sum(rows.item.hours) as Hours
FROM "ForumStuff/f71/f71232"
FLATTEN file.lists as item
FLATTEN regexreplace(item.text,
  "[\[\(][^\[\)]*[\]\)]", "") as cleanText
WHERE item.project
GROUP BY item.project as "Project code"
```

Regarding your third part it’s not as easy to do since you’re doing a double sum. In most cases if I were to do something like that, I would reuse the partial sum query within dataviewjs and calculate the last sum there, and add it to the table “manually”.

In the thread below we do just that, and it also refers to another thread where I explored these options even further. So check that/those threads and see if you’re able to do your total hour sum based on those.

Josh_Lawrence · November 14, 2023, 4:13pm

Thanks so much for your response. Your advice has been invaluable and has helped greatly!

Your response met all my needs for the first solution and it works perfectly.

Your response for the second item was also very good and helpful. Although I’m wondering if you could help me tweak it. Right now it produces a table which lists the task text, sum of hours across that project code and has grouped the table by project code.

I’m wondering if it would be possible to alter this query to include the following: Group by project code, then group by file name (for the date), tasks item cleantext as it currently shows, hours across each projectcode/filename group.

I’ll review your suggested reading regarding the 3rd item.

Thanks for your help again.

holroy · November 14, 2023, 4:33pm

When using DQL queries, like these, you’re only left with one group option. If switching to dataviewjs you do have the option to also do groupIn() (or similar), in other words to have two group levels.

However, at the cost of the visual appearance, you could combine the group objects you want, so then you could do something like:

```dataview
TABLE rows.cleanText as Tasks, sum(rows.item.hours) as Hours
FROM "ForumStuff/f71/f71232"
FLATTEN file.lists as item
FLATTEN regexreplace(item.text,
  "[\[\(][^\[\)]*[\]\)]", "") as cleanText
WHERE item.project
GROUP BY item.project + " " + file.link as "Project code and date"
```

Here we group on the concatenation of the project code and the file.link/date, and get a result like:

It doesn’t sum up all the hours for entirety of the different project codes, nor the overall total of hours, but that should be doable using the trick hinted to earlier of doing the DQL query from within dataviewjs, and then building a new table to presentation. Either with interleaved sums, or as a new table summing up each project code, and the global total of hours. Can’t do either of those queries just now, as I’m off to another meeting.

Josh_Lawrence · November 14, 2023, 4:40pm

Thank you once again. This will be a big help for now. I’ll continue working through your suggested readings to resolve the final object.

Thank you for all of your assistance and enjoy your meeting!

holroy · November 14, 2023, 9:47pm

I kind of triggered myself with these sum across the board, so here is a rather intense query doing three different sums in two tables. Before I present the query, and an output example, lets re-iterate on the base for the query, and which totals we’re going to build.

Base query requirements (with three different sum variants)

The base data is a standard DQL query holding the columns: project, date, tasks, hours. In all the sums below, we’re aiming to sum the hours in various ways, and the table is sorted (and in this particular case grouped) on the combination of the two first columns
We’re going to calculate the grand total of hours presented in the table. This is displayed at the very bottom of the table. Code related to this is marked with **1**. This sum is done using a .reduce() function on all values, where we start with picking out the column to sum on
We’re going to calculate a running total of hours for each project, where we sum the hours for the current project (until the project changes). Code related to this is marked with **2**, and the running group total is added below the lines for that project. This calculation is done by adding the hours as we tag along, and reset whenever the project key changes.
Lastly we’re going to calculate the hours for any given day, aka a date total, and in this case the values are scattered around in the table, but we want to sum up the hours related to unique values in the second column. Code related to this is marked with **3**. Presentation of these sums is done in a separate table after the main table.

The output

An example output using the query below on some random test data:

Notice how I’ve used slighty different formatting for the various running and grand totals within the table of the main data, and a separate table for the date totals.

The query

## Multiple sums

```dataviewjs

const result = await dv.query(`
  TABLE WITHOUT ID
    rows.item.project[0] as "Project",
    rows.file.link[0] as "Date",
    rows.cleanText as Tasks,
    sum(rows.item.hours) as Hours
  FROM "ForumStuff/f71/f71232"
  FLATTEN file.lists as item
  FLATTEN regexreplace(item.text,
    "[\\[\\(][^\\[\\)]*[\\]\\)]", "") as cleanText
  WHERE item.project
  GROUP BY item.project + " " + file.link as "Project code and date"
`)

if ( result.successful && result.value.values.length > 0) {
  const values = result.value.values
  console.log(values)

  // **1** variable and calculation
  let grandTotal = values
    .map(v => v[3])
    .reduce((a, c) => a + c, 0.0)
    

  // **2** variables
  let runningKey = null
  let runningTotal = 0
  let runningArray = []

  let dateTotals = {} // **3** variable
  
  // The main loop over all elements, remember
  // to replace the variable names according to your
  // column header
  values.forEach( ([project, date, task, hour]) => {
    
    // **2** Keep tally on the running total, 
    // and add a group total line if the key changes
    if ( runningKey == null || runningKey != project ) {
      if (runningKey != null) {
        runningArray.push([
          "", 
          "",
          `_<span style="float: right">Total for ${ runningKey }</span>_`,
          runningTotal
        ])
      }
      runningKey = project
      runningTotal = hour
    } else
      runningTotal += hour
    runningArray.push([ project, date, task, hour ])
    
    // **3** Tally up the day totals
    if ( date in dateTotals ) {
      dateTotals[date] += hour
    } else {
      dateTotals[date] = hour
    }
    
  }) // End of all value loop

  // **2** Add the running total for the last group
  runningArray.push([
    "", 
    "",
    `_<span style="float: right">Total for ${ runningKey }</span>_`,
    runningTotal
  ])

  // **1** Add grand total to the array
  runningArray.push([
    "", 
    "",
    `**_<span style="float: right">Overall total</span>_**`,
    grandTotal
  ])
  
  // **1** and **2** output
  dv.table(result.value.headers, runningArray)

  // **3** output
  dv.table(["Date", "Hours"], Object.entries(dateTotals))
} else
  dv.paragraph("~~~~\n" + result.error + "\n~~~~")
```

Some key comments related to the query:

The actual query, **0**, can (and should) be tested standalone, so one is sure that the data is sound. This query can however be replaced with any query of your own, as long as you keep track of which columns you are using, and update the various totals you want to match the column numbers. Also remember that the first column is column 0.
When doing the grand total, **1**, we first do a .map(a => a[3]) to select the column used for this total, and then we do a reduce( (a, c) => a + c, 0.0 ). This reduce operation loops through all items, and uses c for the current item, and updates a for the accumulated value. The last part of 0.0 sets the initial value
When doing running totals, **2**, a prerequisite is that the table is sorted related to its key. We then detect when the key changes, and resets the intermediate sums and so on. Two tricks related to this is that we need to get it going in the start using a test against null, and since we add new rows to the table every time the key changes, it’s better to use a temporary table to hold the new variant. And after the last line in the table, we need to add that last group as well (think of it as when we change the key to nothing… )
Finally the date total, **3**, which could be on any arbitrary column or value, the order is not important, since we use the value as a key into a table holding the accumulated sum. If the key is present then add current value, and if the key is not present add a new key with the current value. Since these values are scattered across the rows, it’s natural to present the result as a table of its own.
A final note on pushing in new rows into the runningArray (or optionally appended to the original result.value.values array), is that one needs to push an array with as many elements as there are columns in the original table. So you still need to add something into each column, even if only an empty string, "".

So there you have a specific query showing a generic way of adding various sums related to a DQL query of your choice. Hopefully, one can extrapolate how to repeat this process using other queries and/or sums.

system · November 21, 2023, 9:48pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.