DataviewJS to find PDFs Recursively

I have a folder of PDFs that are primarily scanned receipts, organized like “_Resources\Documents\YYYY\YYYY-MM” and named like “YYYYMMDD - Organization Name.pdf”. I’d like to have pages for organizations that I interact with frequently. In the page, I’d like to have a list of receipts from that organization.

I’m currently doing something similar for my daily journals, using this DataViewJS script to grab ‘Documents’ and ‘Photos’:

const filename = dv.current().file.name
let grabdate = filename.split("_")[0]
let grabparts = grabdate.split("-")
let year = grabparts[0]
let month = grabparts[1]
let day = grabparts[2]
let newdate = year + month + day
const docpath = '_Resources/Documents/' + year + '/' + year + '-' + month
const photopath = '_Resources/Photos/' + year + '/' + year + '-' + month
const pdfFiles = app.vault.getFiles().filter(file => file.extension === 'pdf' && file.path.includes(docpath) && file.name.includes(newdate))
const jpgFiles = app.vault.getFiles().filter(file => file.extension === 'jpg' && file.path.includes(photopath) && file.name.includes(newdate))
if(Array.isArray(pdfFiles) && pdfFiles.length){
    dv.header(2,"Documents")
    dv.span("******")
    dv.list(pdfFiles.map(file => dv.fileLink(file.path)))
}
if(Array.isArray(jpgFiles) && jpgFiles.length){
    dv.header(2,"Photos")
    dv.span("******")
    dv.list(jpgFiles.map(file => dv.fileLink(file.path, true)))
}

Then each journal has this at the bottom, which displays a header followed by a list of items for either Documents or Photos, but only if any exist:

```dataviewjs
dv.view("Meta/Scripts/JournalFooter")
```

Seems I should be able to do something similar for receipts related to an organization. The difference between what I want to do with organizations and what I’m already doing with journals is:

  • The organizations script will need to search a folder recursively, since the receipts will be scattered across the monthly folders. Daily journal paths were limited to one folder determined by the date of the host page.
  • The organization script can’t just look for an extension, it needs to look for names that contain a string. The journal search is for every pdf in the path folder.

Before I start hunting down the dark corridors of the internet, does someone have code that does something similar, or know how to code something that will do this?

The getFiles() used in the previous script will get all files from your vault, so it’s just a matter of how to exclude/include the files from it in your new script, and this is defined in the filter() function following the getFiles() call.

Currently your filter function is the following:

file => file.extension === 'pdf' && 
          file.path.includes(docpath) && 
          file.name.includes(newdate)

This a shorthand notation for a function definition which takes one parameter (the file), and uses that in the expression to decide whether to keep or ignore the files which are fed into the filter() function from getFiles(). The three parts of the expression deciding to keep/forget the file are:

  • file.extenstion === 'pdf' – Checks whether it’s a pdf file. This it seems like you can ignore in the new script, so just remove it
  • file.path.includes(docpath) – This checks whether the file.path, e.g. /some/folder/noteName.pdf has the docpath somewhere within the path. This could be changed to doing file.path.startsWith(...) to check that it starts with something specific, like your organization folder. (You could consider also using file.folder which is only the folder part of the path.
  • file.name.includes(newdate) – The last part checks whether the name part includes newdate. You say something related to your name having some name/indicator, this is the place to put that. So exchange the newdate with whatever the name is required to have

Hopefully, that should be enough to get you going to adapt the script to your new requirements. Remove stuff which are no longer in used, like potentially the definition of newdate and similar.

Feel free to present your new and improved version, and we’ll help you if there are some errors in there. Do however try to do this on your own, as that will help you immensely the next time to encounter a similar situation.
And if you fail, try again, and/or post code with examples and error messages, so that we might help you on your way.

Thanks for the detailed response! It was far easier than I thought, using almost the identical code I had:

const filename = dv.current().file.name
const docpath = '_Resources/Documents'
const files = app.vault.getFiles().filter(file => file.path.startsWith(docpath) && file.name.includes(filename))
if(Array.isArray(files) && files.length){
    dv.header(2,"Documents")
    dv.span("******")
    dv.list(files.map(file => dv.fileLink(file.path)))
}

Works great, but now I’m realizing in this scenario I need to sort the files by name (which all start with YYYYMMDD). I tried:

const files = app.vault.getFiles().filter(file => file.path.startsWith(docpath) && file.name.includes(filename)).sort(file => file.name, "desc")

Also tried:

const files = app.vault.getFiles().filter(file => file.path.startsWith(docpath) && file.name.includes(filename))
files.sort()

Neither has any affect on the order the files are shown. I’m thinking maybe the sort has to happen down in the dv.list command, but adding variations of .sort(file => file.name, "desc") at different places there didn’t work either (mostly came back with errors). Any advice on sorting a list?

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.