Sentiment analysis on the text data in the obsidian forums

tallguyjenks · October 11, 2020, 9:42am

So I cant sleep.
It’s 2AM, I got bored.
I ran a slapped together sentiment analysis on the text data in the obsidian forums for every post that had >=50 replies.

an aggregate total of 38.1k words here are the results:

The word cloud contains words that occurred 25 or more times in the data set

for anyone who wants the code you can grab it HERE

it looks like this at a glance:

Getting the data was difficult as the site doesn’t easily just give you a ton of JSON so i had to select individual posts which was a pain so i just chose the posts with >50 replies

RikD · October 11, 2020, 10:06am

Hopefully you’ll be able to catch some sleep after this (or not because this is great material your’re sharing ).
Nice “analysis”.

Kiroro · October 11, 2020, 12:53pm

Hmm I might be wrong but “Graph” although not louder than “Tags” yet receives greater efforts from Obsidian. It could be the differentiating “niche” Obsidian is busy building for.

anon27868835 · October 11, 2020, 1:32pm

Interesting. What is the outcome if you eliminate common words like “it’s”, “I’m”, “don’t”, “var”, etc.

Relatively high degrees of sadness, disgust, fear, etc., is curious. I don’t get that impression from reading this forum regularly. Esp. “disgust”? What is that all about?

RikD · October 11, 2020, 1:54pm

IMO this is due to the analysis ‘AI’.
Don’t believe people are ‘disgusted’ by Obsidian
I am surely not!!!

Edit
The reason why I believe it will still take a long time before computers can analyse like human beings do…

ryanjamurphy · October 11, 2020, 5:18pm

I personally uploaded this clip from Veep to YouTube a couple of years ago so that I could reference it on occasions like this:

tallguyjenks · October 11, 2020, 6:41pm

this is one of the annoying things that came from this fun little project. getting the JSON data from the forums into a usable format was a hassle but then i still had issues removing ALL the stop words. the text was cut down to about half after the stop word removal but i still had those creep in and at around 3AM i was too irritated to both polishing it xD

as for the ‘disgust’ thing the NRC sentiment lexicon (according to a quick google search) overweighs the positive values, and has multiple sentiments per each word, some words can be benign but still have that sentiment. i mostly chose it because the produced visuals were better than the ‘bing’ sentiment lexicon that just shows binary ‘positive’ or ‘negative’ but is more accurate

tallguyjenks · October 11, 2020, 6:43pm

TOO TRUE. definitely did not follow my own data analysis best practices on this late night / early morning escapade

anon27868835 · October 11, 2020, 7:07pm

Interesting – and thank you for taking time to gin up this analysis. My general perception of this particular forum population is that vocal members (most frequent posters) are often focused urgently on very precise and often complex requirements, resolutely held. I suppose that’s indicative of Obsidian’s recency and maturity phase. Thus no surprise that the sentiment data seems to break evenly positive / negative in a milieu where readers have strong opinions about software.