DataViewJs Queries for Longform Writers and Researchers

gino_m · November 20, 2023, 3:58pm

I have now 10-12 DV queries – the majority of them in a Dashboard canvas – that help me find mistakes I make during longform writing.

In order to use the scripts, you need to install and enable the Dataview community plugin. Upon enabling the plugin, it will start indexing your vault (or maybe it’s better to relaunch the vault). In the plugin’s General Settings tick on Javascript support if it is not enabled by default.
You create a note and paste any of the following scripts (which you may need to customize for your own specific folders but it will work out of the box). Once you cursor out of the script (press enter after the closing 3 backticks and cursor down), it will be activated.

As you might be unsure about how to handle the folderregex part to customize it, I went ahead and changed those lines to const folderRegex = /(.*)([/\\])/; to target all folders in your vault (support any OS). This may not be what you want, though. But as I said, you can use the scripts as they are now.

I mainly decided to showcase these queries because I like the way I can target specific chunks of text in all my notes and match within that match so that only the matches within the match will be pushed to the table. This cannot be done with normal Obsidian vault searches. Hence we use Javascript.

Query to Show Files with Blockquoted Lines among Footnotes

One of those mistakes were adding blockquotes to lines where I shouldn’t (these mistakes came from when I batch converted stuff with regex replacements a year or so ago).
If you add > to lines in your footnotes, the rendering process will elevate that line above your footnotes. Markdown doesn’t like blockquoted text among your footnotes.

You can tweak the following script to your use case. I filter my vault to include folders that I want to publish (A-Z) first. I have a ## Footnotes heading above my footnotes always but you can target [^1]: as well:

```dataviewjs
// Define folder names; here we target all folders with regex; but you can add anything between the slashes
const folderRegex = /(.*)([/\\])/;

// Query all pages
const allPages = dv.pages("");

// Filter pages further based on folder names to target notes
const filteredPages = allPages.filter(page => {
    const path = page.file.path;
    return folderRegex.test(path);
});

// Crawl the raw data content and filter pages based on content criteria
const pages = await Promise.all(
    filteredPages.map(async (page) => {
        const content = await dv.io.load(page.file.path);

        const mainPattern = /#{2}\sFootnotes([\s\S]*)\z/;
        const mainMatches = content.match(mainPattern);

        if (mainMatches) {
            const blockquotePattern = /^>\s.*/gm;
            const blockquoteMatches = mainMatches[1].match(blockquotePattern);

            if (blockquoteMatches && blockquoteMatches.length > 0) {
                return {
                    link: page.file.link,
                    content: blockquoteMatches.join('\n')
                };
            }
        }

        return null;
    })
);

// Remove null entries and render the result table
const filteredAndProcessedPages = pages.filter(p => p !== null);

dv.table(
    ["Note", "Content"],
    filteredAndProcessedPages.map(p => [p.link, p.content])
);
```

Query to Find All Lines Starting with a Lowercase Character Bypassing the YAML Block

A no-no for longform writers. If a lowercase character starts a line, then it should be included in a list or the text was not properly pasted into the editor (see solution here), or some other problem occurred.

In the following query, we only match lines from the first Heading 1 to the end of the file, bypassing the YAML block where lowercase characters in line starting positions abound. Again, you may need to change the folder that you want to query. Currently, it will query all your vault.

```dataviewjs
// Define the regular expression to match folder names; here we target all folders; but you can add anything between the slashes
const folderRegex = /(.*)([/\\])/;

const allPages = dv.pages("");

const filteredPages = allPages.filter(page => {
    const path = page.file.path;
    return folderRegex.test(path);
});

const pages = await Promise.all(
    filteredPages.map(async (page) => {
        const content = await dv.io.load(page.file.path);

        // We target chunks of text starting from Heading1 -- change this part if you want to anchor to something else
        const mainPattern = /^#{1}[\s\S]*$/m;
        const mainMatches = content.match(mainPattern);

        if (mainMatches) {
            // Secondary pattern to target lowercase letter starters in most or any languages with Latin, Cyrillic, Greek, etc. character sets
            const secPattern = /^\p{Ll}.*/gmu;
            const secMatches = mainMatches[0].split('\n').filter(line => secPattern.test(line));

            if (secMatches && secMatches.length > 0) {
                return {
                    link: page.file.link,
                    content: secMatches.join('\n')
                };
            }
        }

        return null;
    })
);

const filteredAndProcessedPages = pages.filter(p => p !== null);

dv.table(
    ["Note", "Content"],
    filteredAndProcessedPages.map(p => [p.link, p.content])
);
```

gino_m · December 26, 2023, 12:19pm

A latest script was added here:

In it, search term accepts regular expressions so the script can be used for ‘range or so-called proximity search’: e.g. <searchterm1>.*?<searchterm2> will find the two terms in their closest vicinity of one another and the full paragraph will be printed for context. The < and > are not part of the syntax, of course.

Same thing as in the Obsidian search modal, but there you need to put in at least the opening / slash to indicate we want to use regex.
The search done in the DV query like this is more superior as one can copy out the results for further processing. I expect later Obsidian versions will have this functionality in-built.