Near the end of my PhD I started volunteering for RESAAS. They were interested in modelling the traffic on their website. So I developed some simulation software in C#. They were kind enough to let me publish parts of this program.
The main part of the software is a simulation program which compares a queueing system where customers are served on a First-Come-First-Served basis with a queueing system where some customers get priority over others. A detailed explanation for the program can be found here. The C# code is avaiable on GitHub.
I have been taking an on line course on Data Science at Coursera. One of the assignments concerned the mining of Twitter texts. We had to extract live twitter data and use text mining techniques to compute sentiment scores. Sentiment analysis applies natural language processing, computational linguistics, and text analytics to determine whether a text is positive, negative, or neutral.
I thought it would be neat to extract the sentiment scores for each state of the US and then plot the relative sentiment scores. Hence, I wrote some Python code to extract live tweets and compute the average sentiment score per state of the US. I feeded this data into R to make the visualization above. (This is all combined into a single shell script.)
First of all, a word of warning. This is more of an exercise in data visualization and using Twitter API’s. The meaning of a sentiment score is up for debate (see this nice post) and I have not yet done a thorough analysis of the scores. In particular, many states get a score of 0 simply because the data is not very clean. The code for this project is available on GitHub.