Twitter API, sentiment, text mining and visualization

Relative sentiment plot based on twitter data

Example of a (relative) sentiment plot based on twitter data

I have been taking an on line course on Data Science at Coursera. One of the assignments concerned the mining of Twitter texts. We had to extract live twitter data and use text mining techniques to compute sentiment scores. Sentiment analysis applies natural language processing, computational linguistics, and text analytics to determine whether a text is positive, negative, or neutral.

I thought it would be neat to extract the sentiment scores for each state of the US and then plot the relative sentiment scores. Hence, I wrote some Python code to extract live tweets and compute the average sentiment score per state of the US. I feeded this data into R to make the visualization above. (This is all combined into a single shell script.)

First of all, a word of warning. This is more of an exercise in data visualization and using Twitter API’s. The meaning of a sentiment score is up for debate (see this nice post) and I have not yet done a thorough analysis of the scores. In particular, many states get a score of 0 simply because the data is not very clean. The code for this project is available on GitHub.