this is one of the annoying things that came from this fun little project. getting the JSON data from the forums into a usable format was a hassle but then i still had issues removing ALL the stop words. the text was cut down to about half after the stop word removal but i still had those creep in and at around 3AM i was too irritated to both polishing it xD
as for the ‘disgust’ thing the NRC sentiment lexicon (according to a quick google search) overweighs the positive values, and has multiple sentiments per each word, some words can be benign but still have that sentiment. i mostly chose it because the produced visuals were better than the ‘bing’ sentiment lexicon that just shows binary ‘positive’ or ‘negative’ but is more accurate
Interesting – and thank you for taking time to gin up this analysis. My general perception of this particular forum population is that vocal members (most frequent posters) are often focused urgently on very precise and often complex requirements, resolutely held. I suppose that’s indicative of Obsidian’s recency and maturity phase. Thus no surprise that the sentiment data seems to break evenly positive / negative in a milieu where readers have strong opinions about software.