PAHR Sentiment Trajectory

In my previous post on sentiment I discussed the process of building data frames of chapter metrics and word lists. I will use the word data frame to monitor sentiment across the book. I am working with non-unique, non-stop, greater than 3 character words (red line from the previous post). Looking at the word list and comparing to text, I can see that the words are in the order that they appear in the novel. I will use the Bing sentiment determinations from the tidytext package to annotate each word as being either of positive or negative sentiment. I will then group by 15 words and calculate the average sentiment.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
##make a dataframe of all chapters
##use non.stops which also has words with <=3 chars removed

word <- non.stop[[1]]
chpt <- rep(1, length(word))
pahr.words <-data.frame( cbind(chpt, word))

for(i in 2:17){
word <- non.stop[[i]]
chpt <- rep(i, length(word))
holder <- cbind(chpt, word)
pahr.words <- rbind(pahr.words, holder)
rm(holder)
}

##I checked and words are in the order that they appear
##in the novel
library(tidytext)
bing <- sentiments %>%
filter(lexicon == "bing") %>%
select(-score)



d2 <- pahr.words %>%
inner_join(bing) %>%
cbind(sort( c(rep(1:201, 15),rep(202,4)))) ##this will group words by 15 for averaging sentiment

names(d2)[5]<-"group"

d3 <- count(d2, chpt,group,sentiment)

library(tidyr)

d4 <- spread(d3, sentiment, n)
d4$sentiment <- d4$positive - d4$negative

Plot as a line graph, with odd chapters colored black and even chapters colored grey. I also annotate a few moments of trauma within the narrative.

1
2
3
4
library(ggplot2)
mycols <- c(rep(c("black","darkgrey"),8),"black")
ggplot( d4, aes(group, sentiment, color=chpt)) + geom_line() + scale_color_manual(values = mycols) + geom_hline(yintercept=0, linetype="dashed", color="red") + annotate("text", x = 146, y = -14, label = "Hysteria in the gymnasium") + annotate("text", x = 147, y = -13, label = "x") + annotate("text", x = 12, y = -11, label = "Edith screams on Rock") + annotate("text", x = 35, y = -11, label = "x") + annotate("text", x = 68, y = -13, label = "Bad news delivered\n to Ms Appleyard") + annotate("text", x = 49, y = -13, label = "x")

We can see that the novel starts with a positive sentiment - “Beautiful day for a picnic…” - which gradually moves into negative territory and remains there for the majority of the book.

Does sentiment analysis really work? Depends on how accurately word sentiment is characterized. Consider the word “drag”:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
> d2[d2$word=="drag",]
chpt word sentiment lexicon group
133 1 drag negative bing 9
141 1 drag negative bing 10
162 1 drag negative bing 11
169 1 drag negative bing 12
183 1 drag negative bing 13
198 1 drag negative bing 14
199 1 drag negative bing 14
213 1 drag negative bing 15
227 1 drag negative bing 16
250 1 drag negative bing 17
263 1 drag negative bing 18
275 2 drag negative bing 19
300 2 drag negative bing 20
457 3 drag negative bing 31
468 3 drag negative bing 32
585 4 drag negative bing 39
602 4 drag negative bing 41
630 4 drag negative bing 42
633 4 drag negative bing 43
665 4 drag negative bing 45
678 4 drag negative bing 46
679 4 drag negative bing 46
743 5 drag negative bing 50
1224 7 drag negative bing 82
2978 16 drag negative bing 199
>

There are many instances of the word drag annotated as negative. Consider the sentence “It’s a drag that sentiment analysis isn’t reliable.” That would be drag in a negative context. In Picnic, a drag is a buggy pulled by horses, mentioned many times, imparting lots of undeserved negative sentiment to the novel. Drag in Picnic is neutral and should have been discarded. Inspecting the sentiment annotated word list, many other examples similar to drag could be found, some providing negative, some positive sentiment, on average probably cancelling each other out. Even more abundant are words properly annotated, which, on balance may convey the proper sentiment. I would be skeptical, though, of any sentiment analysis without a properly curated word list.

In the next post I will look at what can be done with a corpus.

Share