get a stream of tweets



The first part of the course I have been taking on data science deals with analysis of Twitter data. This is aimed at discerning the overall sentiment of a Tweet or collection of tweets.

To get this going the first thing you need is a big collection of tweets in a format that is easy for a program to digest. Step in JSON, a text based output that represents data in a collection of arrays. Once you have this data set you can then start experimenting. Maybe try something simple first, like taking ten minutes worth of samples every hour and mapping the twitter activity for an average of how many tweets per country for that hour.

Twitter Stream From “Introduction to Data Science” A coursera program with lectures from Bill Howe. You will need to edit the access and consumer credentials to make this application work.

Run as follows: python twitterstream.py > output.json

import oauth2 as oauth import urllib2 as urllib

accesstokenkey = "<Enter your access token key here>" 
accesstokensecret = "<Enter your access token secret here>"

consumerkey = "<Enter consumer key>" 
consumersecret = "<Enter consumer secret>"

_debug = 0

oauthtoken = oauth.Token(key=accesstokenkey, secret=accesstokensecret) 
oauthconsumer = oauth.Consumer(key=consumerkey, secret=consumersecret)

signaturemethodhmacsha1 = oauth.SignatureMethodHMAC_SHA1()

http_method = "GET"

httphandler = urllib.HTTPHandler(debuglevel=debug) 
httpshandler = urllib.HTTPSHandler(debuglevel=debug)

'''Construct, sign, and open a twitter request 
   using the hard-coded credentials above.''' 

def twitterreq(url, method, parameters): 
    req = oauth.Request.fromconsumerandtoken(oauthconsumer,
                                             token=oauthtoken,
                                             httpmethod=httpmethod, 
                                             httpurl=url, 
                                             parameters=parameters)

    req.signrequest(signaturemethodhmacsha1, oauthconsumer, oauthtoken)

    headers = req.to_header()

    if httpmethod == "POST":
        encodedpostdata = req.topostdata()
    else:
        encodedpostdata = None

    url = req.to_url()

    opener = urllib.OpenerDirector()
    opener.addhandler(httphandler) 
    opener.addhandler(httpshandler)

    response = opener.open(url, encodedpostdata)

    return response

def fetchsamples(): 
    url = "https://stream.twitter.com/1/statuses/sample.json"
    parameters = []
    response = twitterreq(url, "GET", parameters)
    
    for line in response:
        print line.strip()

if name  == 'main':
    fetchsamples()