Find longest notes in vault

Hi all,

Is there a way to identify the largest (word count) notes in my vault? I assume dataview can do this but a search of the docs and this forum doesn’t easily turn up an answer.

I want to do this because when cleaning my vault sometimes I want to atomise notes, so identifying the largest ones would be useful

This may be helpful:

There may be a dataviewjs way to do it or other plugins. I’m curious to see the suggestions the community comes up with.

1 Like

If one doesn’t need an exact word count, the far cheaper solution is just to check for file size. In general it should be true that the larger the file the more words it contains…

Checking the file size is very easy, as that is part of the file stats. Checking word count requires a full read of the file, determining what are words and what are properties, or markdown syntax, or various dividers. In most cases you’ll only get an approximation anyways, unless you actually render the entire document and then do a word count.

Our DataView guru had something like this in mind:

```dataview  
TABLE round(file.size / 1024) as "File Size (kb)"
SORT file.size desc
LIMIT 500
```

Thanks guys

If you really want to sort by word count, and you have Better Word Count installed, you can use its new API within a dataviewjs code block.

When this came out, I implemented a table that relies on this API to get an overview of the word counts and writing progress for the notes that are part of my master’s thesis:

image

I adapted that script for your usecase below. It displays a table with the top 10 files (word-count-wise), sorted by word count. It also shows how many days ago you last modified each note. I limited the query to files with the “md” extension. Since this includes Excalidraw files, I made a separate check for that, to filter them out.

```dataviewjs
const bwc = app.plugins.plugins["better-word-count"].api;
const luxon = dv.luxon;

// Get all pages in the vault
const pages = await dv.pages('""').where(page => page.file.ext === "md" && page.file.path.endsWith("excalidraw.md") == false);

// Array to store the final rows for the table
const tableRows = [];

// Use Promise.all to wait for all word counts
await Promise.all(pages.map(async (page) => {
    const wordCount = await bwc.getWordCountPagePath(page.file.path);
    page.wordCount = wordCount;
}));

// Create a new array that is sorted
const sortedPages = [...pages].sort((pageA, pageB) => pageB.wordCount - pageA.wordCount);

// Iterate through the sorted pages and add rows to the tableRows array
const numberOfNotesToShow = 10; // Number of largest notes to display
for (let i = 0; i < numberOfNotesToShow && i < sortedPages.length; i++) {
    const page = sortedPages[i];
    const wordCount = page.wordCount;
    const relativeModified = luxon.DateTime.fromISO(page.file.mtime).toRelative();

    tableRows.push([
        page.file.link,
        wordCount,
        relativeModified,
    ]);
}

// Print the table with the top notes based on word count
dv.table(["Name", "Word Count", "Last modified"], tableRows);
```

If you want to use it, I recommend limiting the query to a particular folder, like your inbox folder, to make it run faster. When run on your whole vault, it can take a while to load (maybe 5 seconds?).

2 Likes

This is the solution.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.