Sentiment analysis: does it work?

Social media listening tools claim to measure sentiment online. PRWeek put some to the test, measuring public reaction to a live football match.

Manchester City v Tottenham: we asked five organisations to track opinions during the game
Manchester City v Tottenham: we asked five organisations to track opinions during the game
One revolutionary aspect of social media is that they provide an outlet for instant, candid feedback from the public, whether they are in awe at the launch of a games console or lambasting a ­political address. Such insights can be hugely valuable for PR professionals to assess whether an event, a new product or a speech hits the mark or falls flat.

But one question that continues to dog even the most social media savvy PR professional is how on earth this sentiment can be measured accurately. It is one thing to pick out a few comments on Twitter or Facebook to provide a snapshot of opinion. But to produce the kind of analysis that moves much beyond the ­anecdotal can feel like nailing jelly to a wall.

Thankfully, help is at hand. A number of organisations now claim to be able to measure sentiment on social media and ­produce reliable data on what the public thinks.

Generic sentiment works for 60 to 70 per cent of the time but you need an approach specific to the event.

Robert Glaesener, chief executive officer at Talkwalker
PRWeek decided to put this to the test. We asked five ­organisations to track opinions during a live event, namely last month’s Premier League match between ­Manchester City and Tottenham. We could have chosen any number of high-profile events for our experiment, and the game itself did not matter – although the fact there were five goals, two penalties and one sending off helped to generate plenty of activity on social media.

Our concerns were whether we could measure sentiment, and what form it would take. Crucially, we wanted to see if there was a correlation between the findings of the different agencies. If there wasn’t, how can we say it is a true measurement of sentiment?

Disappointingly, three of the five agencies approached by PRWeek that initially agreed to take part later backed down. They pointed to difficulties such as weeding out sarcasm and dealing with ambiguity. One agency said that while it was easy to search for simple descriptions like ‘great’ or ‘poor’, a phrase like ‘it was not great’ would count as a positive if there was no advanced search facility to scan words around key ­adjectives. Such a system is understood to be in the development stage, the agency said.

These comments were ­troubling, suggesting a lack of confidence that the tools available would be up to the job. Nevertheless, two ­organisations, Manning Gottlieb OMD and Talkwalker, valiantly stepped up to the plate. We asked them to track postings for the two hours before the game, during key incidents such as goals and red cards, and for the couple of hours after the final whistle.



Talkwalker used its own tool to collate the data, looking at Twitter, Facebook, forums, blogs, YouTube and other outlets. CEO Robert Glaesener says that in general, three types of ­sentiment analysis can be applied. The first is a generic search for positive and negative phrases. The second takes a sample of the findings and each one is looked at manually. Neither is perfect; the former has clear limitations, highlighted earlier, while the latter is very time consuming.

For this experiment, a third ­approach was used that involves inputting search terms that are likely to have completely different connotations depending on the event. "What we’re finding out is the generic sentiment approach works for 60 to 70 per cent of the time, but for other times – for example, Tottenham versus Manchester City – you have to have an approach that is very specific to the event, to the ­industry," says Glaesener.

This can also be true, he says, when people discuss consumer products, for example. "If you say it ‘breaks easily’, that can be great or not great – your Kit-Kat should break but your car shouldn’t break." For our experiment, each posting was searched for more than one relevant word. For example, ‘Aguero’ (the scorer of all four Manchester City goals) and ‘City’, or ‘Tottenham’ and ­‘penalty’. Glaesener believes this approach means 95 per cent of the posts that are examined will relate to the game rather than something else.

A similar nuanced approach was needed for specific key points in the game. For example, the word penalty was ­designated as negative by the tool, but a closer look at posts containing that word found it was frequently linked to positive sentiments such as ‘powerful’, ‘sublime’ and ‘brilliant’. Glaesener says the only way to be certain of achieving 100 per cent accuracy is to examine every social media post manually.



Manning’s research used the tool Radian6 to search for ­positive and negative words associated with the game, again covering Twitter, Facebook, blogs and non-paywall ­conversation threads. During the match, Manning added its own searches for hashtags commonly used by followers of the two teams. For example: #MCFC, #THFC, #COYS (come on you Spurs) and key individuals like #Aguero.

One perennial problem is ­weeding out spam, says Nick Pritchard, head of social media at Manning. The new generation of sophisticated spammers will react quickly to major events by using trending words in Twitter posts, for example, that feature links to unrelated ­commercial websites.

Then there is the question of troublesome hashtags; #City is used by many fans of the Manchester club, but searching for it would be almost pointless because the hashtag is used so often.

What of the results? On the one hand, it is difficult to make too many direct comparisons given the different approaches of the two surveys, and differences in specific avenues of sentiment tracking.

In places, there were stark ­differences in findings. Manning’s research found that 87 per cent of social media posts using both #COYS and #MCFC – in other words, a snapshot of sentiment from fans of the two teams – were positive about the game overall.

There are certain limitations on what we can do, and therefore we have to make best guesses, often, at what would make a fair assumption

Nick Pritchard, head of social media at Manning Gottlieb OMD
For Talkwalker, just 20.3 per cent were positive. However, this included a huge spike in negative comments when Spurs player ­Roberto Soldado’s controversial penalty was awarded. This ­analysis is also more likely to ­factor in people who do not ­support either team because of the ­different methodology.

However, reassuringly, there are some correlations in the findings. The Manning research found that 70 per cent believed the offence that led to the penalty took place outside the box (i.e. that the decision to award it was unfair). The Talkwalker survey judged that a very similar proportion, 72 per cent, of postings were negative about that decision.

So what have we learned from this? Perhaps one lesson is that the analysis is more reliable when focused on a particular moment or incident rather than a broad look at overall ­sentiment. Pritchard has his own concerns about the limitations of the tools available. He says there is a "big oversell" in what ‘social listening’ tools can deliver. Pritchard describes the systems as "poor" in general, with a need to have a human brain to digest the findings.

"There are certain limitations on what we can do, and therefore we have to make best guesses, often, at what would make a fair assumption," explains Pritchard. "We’re pulling insights from trends rather than absolute data numbers."

That is not to downplay what can be achieved. Both agencies produced a large amount of data (far too much to cover in depth in this feature), which give a decent snapshot of sentiment and could doubtless be very useful for drilling down into details.

But there are still shortcomings with the tools available, and while people continue to use sarcasm and ambiguous language – in other words, while they are still human – we will still be a yard short of a truly spectacular ­performance.


My experience of sentiment analysis

Robin Riley, digital transformation lead and head of profession for digital, Ministry of Defence

My first time using social media sentiment analysis (SMSA) was not exactly a success. The report the tool generated showed huge positive sentiment around what had been a bad story for the organisation. The algorithm confused positive support for negative media stories about the issue with positive sentiment, and so gave a result that wasn’t credible. And uses of language such as idiom and sarcasm had completely wrong-footed the machine.

That was some years ago. Since then SMSA has been on a journey, with significant strides made in algorithms, language processing and machine learning to distinguish positive from negative and pick the signal from the noise. Sentiment analysis is most needed, and so most attractive, when the volume of tweets and posts is very large. Happily (due to the statistical approaches involved) this is also when it is most likely to be accurate.

We see this in the sentiment analysis from the Man City/Tottenham match, which shows a clear rise in negative sentiment among Spurs fans between the start and end of the match, and to a lesser extent a rise in positive sentiment among the Man City supporters. Although it is worth noting that the analysis shows that Spurs’ supporters tweets (containing #COYS) stayed more positive than negative despite their 4-1 loss – well, they are supporters, after all. The analysis was also useful for identifying top pundits and a real-time opinion poll on whether an offence was committed inside or outside the penalty box. Despite the improvements in SMSA tools, there are still pitfalls. How sentiment is expressed can be very specific to a group, for example they may have their own language and jargon. This would certainly be true for football fandom with its distinct tribes.

Challenges remain in interpreting language. In this example, the analysis had to be adjusted for the word ‘penalty’, which the algorithm identified as inherently negative despite it being positive, of course, for Man City.

Sentiment analysis might form part of the picture to show the intermediate effect of a campaign but PRs should always look to real-world outcomes – such as changes in behaviour – as their hard success measures

Robin Riley, digital transformation lead and head of profession for digital, Ministry of Defence
Sentiment expressed online is not necessarily an absolute measure of overall sentiment – people tend to express sentiment relative to expectation; in other words not whether something was good or bad but whether it was better or worse. This means when we use these tools we need to have an understanding of the audience’s prior expectation. For example, if fans go into a match expecting to win but then lose one-nil... or expect to lose heavily but then only lose one-nil, then their sentiment could be very different even if the outcome is the same.

SMSA undoubtedly provides useful tools for insight. But we should always be clear that an improvement in sentiment reported by these tools is not, in itself, a measure of overall campaign success. Sentiment analysis might form part of the picture to show the intermediate effect of a campaign but PRs should always look to real-world outcomes – such as changes in behaviour – as their hard success measures.

These are some of the reasons why SMSA works best when it is used to augment professional analysis, not replace it. The machines haven’t taken over, at least not just yet.


Risk of distortion

Talkwalker gives an example of how the use of certain words in social media can skew the meaning and distort an analysis of sentiment.

One Twitter user called Opta Joe, who has 590,000 followers, sent this tweet soon after the match ended: "Only two Man City players have scored four goals in a single PL game. Sergio Aguero and Edin Dzeko – deadly."

Talkwalker said: "The same tweet was retweeted hundreds of times, and the word deadly perceived as negative." The upshot was a spike in apparent negative sentiment during a period after the game.

To sift out the anomaly, Glaesener said an additional analysis was undertaken on the same posts that looked for positive words alongside the apparently negative one. This gave a truer picture of the intended sentiment, and the posts were subsequently deemed to be positive.

Have you registered with us yet?

Register now to enjoy more articles and free email bulletins

Register
Already registered?
Sign in

Would you like to post a comment?

Please Sign in or register.