Moved from onmylemon.co.uk
The first part of the course I have been taking on data science deals with analysis of Twitter data. This is aimed at discerning the overall sentiment of a Tweet or collection of tweets.
To get this going the first thing you need is a big collection of tweets in a format that is easy for a program to digest. Step in JSON, a text based output that represents data in a collection of arrays. Once you have this data set you can then start experimenting. Maybe try something simple first, like taking ten minutes worth of samples every hour and mapping the twitter activity for an average of how many tweets per country for that hour.
From "Introduction to Data Science" A coursera program with lectures from Bill Howe
You will need to edit the access and consumer credentials to make this application work
Run as follows: python twitterstream.py > output.json
import oauth2 as oauth import urllib2 as urllib
accesstokenkey = "<Enter your access token key here>" accesstokensecret = "<Enter your access token secret here>"
consumerkey = "<Enter consumer key>" consumersecret = "<Enter consumer secret>"
_debug = 0
oauthtoken = oauth.Token(key=accesstokenkey, secret=accesstokensecret) oauthconsumer = oauth.Consumer(key=consumerkey, secret=consumersecret)
signaturemethodhmacsha1 = oauth.SignatureMethodHMAC_SHA1()
http_method = "GET"
httphandler = urllib.HTTPHandler(debuglevel=debug) httpshandler = urllib.HTTPSHandler(debuglevel=debug)
''' Construct, sign, and open a twitter request using the hard-coded credentials above. ''' def twitterreq(url, method, parameters): req = oauth.Request.fromconsumerandtoken(oauthconsumer, token=oauthtoken, httpmethod=httpmethod, httpurl=url, parameters=parameters)
req.signrequest(signaturemethodhmacsha1, oauthconsumer, oauthtoken)
headers = req.to_header()
if httpmethod == "POST": encodedpostdata = req.topostdata() else: encodedpostdata = None url = req.to_url()
opener = urllib.OpenerDirector() opener.addhandler(httphandler) opener.addhandler(httpshandler)
response = opener.open(url, encodedpostdata)
def fetchsamples(): url = "https://stream.twitter.com/1/statuses/sample.json" parameters =  response = twitterreq(url, "GET", parameters) for line in response: print line.strip()
if name == 'main': fetchsamples() `
I’ll be putting up one or two examples of what you can do with this data in the coming days and weeks.