Sentiment analysis on the text data in the obsidian forums

So I cant sleep.
It’s 2AM, I got bored.
I ran a slapped together sentiment analysis on the text data in the obsidian forums for every post that had >=50 replies.

an aggregate total of 38.1k words here are the results:



The word cloud contains words that occurred 25 or more times in the data set


for anyone who wants the code you can grab it HERE

it looks like this at a glance:


Getting the data was difficult as the site doesn’t easily just give you a ton of JSON so i had to select individual posts which was a pain so i just chose the posts with >50 replies

5 Likes

Hopefully you’ll be able to catch some sleep after this (or not because this is great material your’re sharing :wink: ).
Nice “analysis”.

Hmm I might be wrong but “Graph” although not louder than “Tags” yet receives greater efforts from Obsidian. It could be the differentiating “niche” Obsidian is busy building for.

Interesting. What is the outcome if you eliminate common words like “it’s”, “I’m”, “don’t”, “var”, etc.

Relatively high degrees of sadness, disgust, fear, etc., is curious. I don’t get that impression from reading this forum regularly. Esp. “disgust”? What is that all about?

IMO this is due to the analysis ‘AI’.
Don’t believe people are ‘disgusted’ by Obsidian :wink:
I am surely not!!!

Edit
The reason why I believe it will still take a long time before computers can analyse like human beings do…

1 Like

I personally uploaded this clip from Veep to YouTube a couple of years ago so that I could reference it on occasions like this:

this is one of the annoying things that came from this fun little project. getting the JSON data from the forums into a usable format was a hassle but then i still had issues removing ALL the stop words. the text was cut down to about half after the stop word removal but i still had those creep in and at around 3AM i was too irritated to both polishing it xD

as for the ‘disgust’ thing the NRC sentiment lexicon (according to a quick google search) overweighs the positive values, and has multiple sentiments per each word, some words can be benign but still have that sentiment. i mostly chose it because the produced visuals were better than the ‘bing’ sentiment lexicon that just shows binary ‘positive’ or ‘negative’ but is more accurate :man_shrugging:

TOO TRUE. definitely did not follow my own data analysis best practices on this late night / early morning escapade :rofl:

1 Like

Interesting – and thank you for taking time to gin up this analysis. My general perception of this particular forum population is that vocal members (most frequent posters) are often focused urgently on very precise and often complex requirements, resolutely held. I suppose that’s indicative of Obsidian’s recency and maturity phase. Thus no surprise that the sentiment data seems to break evenly positive / negative in a milieu where readers have strong opinions about software.

2 Likes