Best way to Count words in quotes?

Things I have tried

I’ve tried messing with regular expressions, e.g. /\“(.*?)\g/, but that only gives me THAT there are words in quotes in certain files, I need to find out the amount of words that is.

What I’m trying to do

I need to make sure I’m not quoting too much (somewhere it says the text isn’t meant to include more than 10% direct quotes), so I’d like a way to count the words in quotes.

How is this possible?

I’ve also tried /([“”])(?:(?=(\\?))\2.)*?\1/ and the same thing happens. I’ve also tried using grep with either of those regex searches in Terminal (I’m on Linux Mint) but neither have any output.

Counting words is an icky business, and prone to loads of errors. It’s hard to keep track of how quotes are used. However, I can give you some guidelines, I think.

First of all what you would like to do is to first get all the quotes of the file, doing something like /"[^"]+"/ (possibly with options to get it to match multiple times, and so on). This should/could get all the various quotes in the file.

Then you could split those quotes into “words” by doing a split on space and/or various punctuation characters, and count the elements in the resultuing array, and then keep adding them into your total count of words in quotes.

For an example of this regex working, see regex101: build, test, and debug regex

In what context are you doing your regular expressions? Is it within a Templater template? Within a DataviewJS query? Just a pure search? Or where?

2 Likes

Once you have the quotes you could just feed them to something that counts words (for example, put them in their own note and use the built-in count, or pipe the result to wc in Terminal), if you’re just checking an individual document and not building a table or something.

I was able to solve this with help over at Stackoverflow: How do I count the words in quotes in markdown files (using regex or another way)? - Stack Overflow

I opened my vault with Visual Studio, did a regex search, highlight all results (everything in quotes) copied that into a new note, and just used the Obsidian word count.

1 Like

Hey :cherry_blossom:

It’s a cool question, so I made you a vanilla JS ‘plugin’.

What it does:

  • separates quoted and unquoted strings.
  • sums the words of each
  • calculates the % of quoted words.

For example, this text:

hey look at that
“hey look at that”
“hey look at that”
“hey look at that”

Returns this (in console):

* FINISHED *
QUOTED word count = 12
UNQUOTED word count = 4
QUOTED PERCENTAGE === 75%

**

You can also turn on “MONITOR MODE” which displays details:

= added to UNQUOTED: 4 || UNQUOTED Total: 4
The LINE is: hey,look,at,that
= added to QUOTED: 4 || QUOTED Total: 4
The LINE is: hey,look,at,that
= added to QUOTED: 4 || QUOTED Total: 8
The LINE is: hey,look,at,that
= added to QUOTED: 4 || QUOTED Total: 12
The LINE is: hey,look,at,that

**

Here is the vanilla JS:

// SET TO TRUE FOR "MONITOR MODE" (to show stats, quoted/unquoted phrases in console)
quoteSift(false);

function quoteSift(x) {
    // get all cm-lines in active tab
    var lineText = document.querySelectorAll(".workspace-leaf.mod-active .cm-content .cm-line");
    let fullText = "";

    //join all lines, remove leading/trailing spaces, remove double spaces
    console.log("======= new run || monitor mode: " + x + " =======");
    let z = lineText.length;
    for (var c = 0; c < z; c++) {
        let liner = lineText[c].innerText.trim();
        if (c > 0) {fullText = fullText + " " + liner;}
        if (c === 0) {fullText = liner;}
    }
    fullText = fullText.replace(/\s+/g, ' ');
    if (x) {console.log("fulltext: " + fullText);}

    // split at " ... count words in each segment. sum non-quoted words. sum quoted words
    let textArray = fullText.split(/(\")/);
    let d = quoteCount = nonCount = 0;
    for (var c = 0; c < textArray.length; c++) {
    	let g = textArray[c].trim();
        let wordCount = g.split(' ').length;
        let gLine = g.split(' ');  
        let b = g[0]; // get first character of string
        if (b === '"') { // toggle quoted/unquoted string
            if (d === 0) {d = 1; continue;}
            if (d === 1) {d = 0; continue;}
        }
        if (typeof b === 'undefined') {continue;} // catch hidden characters/breaks
    	if (d === 0) {
    		nonCount = nonCount + wordCount;
            if (x) {console.log("=== added to UNQUOTED: " + wordCount + " || UNQUOTED Total: " + nonCount + "\nThe LINE is: " + gLine );}
    		continue;
    	}
    	if (d === 1) { 
    		quoteCount = quoteCount + wordCount;
            if (x) {console.log("=== added to QUOTED: " + wordCount + " || QUOTED Total: " + quoteCount + "\nThe LINE is: " + gLine );}
            if (c === (textArray.length - 1)) {console.log("NOTE - Probably missing end quote");}
    	}
    }
let qPercent = ((quoteCount / (nonCount + quoteCount)) * 100); // get final percentage
console.log("* FINISHED *\nQUOTED word count = " + quoteCount + "\nUNQUOTED word count = " + nonCount + "\nQUOTED PERCENTAGE === " + qPercent + "%");
}

or private-bin if you prefer.

**

I’m glad you found a solution though.

Figured I should send this anyway, since it’s complete. @holroy @CawlinTeffid

[NOTE: I’m using the “Javascript Init” community-plugin to add that JS to Obsidian]

I hope you have a good day :tropical_fish:

2 Likes

That’s really cool, thank you! Stupid question, but how do I actually run this? Do I make a file and then run in it a folder with my notes?

Also, I use GitHub - mgmeyers/obsidian-smart-typography: Converts quotes to curly quotes, dashes to em dashes, and periods to ellipses so my quotation marks look like “ ” instead of " ".
Which part of your code do I need to replace so that’s included?

Hey, that’s a fine question.

To set it up (first-time only), you just go:

  1. Preferences > community plugins > get “Javascript Init” (great for adding JS)
  2. Preferences > Javascript Init settings > paste this new code (below) in the box

Now - to run it, you just:

  1. Click the “Run Javascript Init” button in the left-ribbon of Obsidian
  2. The results are in your console log (in devTools). Or if you want the results as a direct popup (alert), just uncomment the last line of code (change //alert... to alert...).

**

I’ve added in your requests.

This new version:

  • automatically processes curly-quotes.
  • offers the “results as popup” option (the last line of code).

Version 2:

// SET TO TRUE FOR "MONITOR MODE" (to show stats, quoted/unquoted phrases in console)
quoteSift(false);

function quoteSift(x) {
    // get all cm-lines in active tab
    var lineText = document.querySelectorAll(".workspace-leaf.mod-active .cm-content .cm-line");
    let fullText = "";

    // join all lines, remove leading/trailing spaces
    console.log("======= new run || monitor mode: " + x + " =======");
    let z = lineText.length;
    for (var c = 0; c < z; c++) {
        let liner = lineText[c].innerText.trim();
        if (c > 0) {fullText = fullText + " " + liner;}
        if (c === 0) {fullText = liner;}
    }
    // convert curly-quotes to straight-quotes, remove double spaces
    fullText = fullText.replaceAll('\u201d', '"').replaceAll('\u201c', '"');
    fullText = fullText.replace(/\s+/g, ' ');
    if (x) {console.log("fulltext: " + fullText);}

    // split at " ... count words in each segment. sum non-quoted words. sum quoted words
    let textArray = fullText.split(/(\")/);
    let d = quoteCount = nonCount = 0;
    for (var c = 0; c < textArray.length; c++) {
    	let g = textArray[c].trim();
        let wordCount = g.split(' ').length;
        let gLine = g.split(' ');  
        let b = g[0]; // get first character of string
        if (b === '"') { // toggle quoted/unquoted string
            if (d === 0) {d = 1; continue;}
            if (d === 1) {d = 0; continue;}
        }
        if (typeof b === 'undefined') {continue;} // catch hidden characters/breaks
    	if (d === 0) {
    		nonCount = nonCount + wordCount;
            if (x) {console.log("=== added to UNQUOTED: " + wordCount + " || UNQUOTED Total: " + nonCount + "\nThe LINE is: " + gLine );}
    		continue;
    	}
    	if (d === 1) { 
    		quoteCount = quoteCount + wordCount;
            if (x) {console.log("=== added to QUOTED: " + wordCount + " || QUOTED Total: " + quoteCount + "\nThe LINE is: " + gLine );}
            if (c === (textArray.length - 1)) {console.log("NOTE - Probably missing end quote");}
    	}
    }
let qPercent = ((quoteCount / (nonCount + quoteCount)) * 100); // get final percentage
console.log("* FINISHED *\nQUOTED word count = " + quoteCount + "\nUNQUOTED word count = " + nonCount + "\nQUOTED PERCENTAGE === " + qPercent + "%");
//alert("* FINISHED *\nQUOTED word count = " + quoteCount + "\nUNQUOTED word count = " + nonCount + "\nQUOTED PERCENTAGE === " + qPercent + "%");
}

[or private-bin if you prefer]

**

If you have any questions/issues, feel free to say :guitar:

1 Like

That’s perfect, thank you so much!!

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.