I'm very new to twitter api, please help me understand the difference between two things.
As far as I understand I can get real-time tweets by using tweepy for example :
hashtag = ['justinbieber']
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
data = status.__getstate__()
print data
output.write("%s\n "% data)
except Exception, e:
print >> sys.stderr, 'Encountered Exception:', e
pass
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
class Twitter():
def __init__(self):
consumer_key=
consumer_secret=
access_key =
access_secret =
self.auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
self.auth.set_access_token(access_key, access_secret)
self.api = tweepy.API(self.auth)
def start(self):
l = CustomStreamListener()
stream = tweepy.streaming.Stream(self.auth,l, secure=True)
stream.filter(follow=None, track=hashtag)
if __name__ == "__main__":
Twitter().start()
But what exactly I'm getting if I use python-twitter's api.GetSearch()? For example:
def t_auth(self):
consumer_key=
consumer_secret=
access_key =
access_secret =
self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
return self.api
self.tweets = []
self.tweets.extend(self.api.GetSearch(self.hashtag, per_page=10))
Imagine that I put last line in an infinite while loop, will I get the same result as in the first example? What's the difference between those two?
Here's my insight.
The first example with tweepy stream is a use case of twitter streaming API.
The second example using python-twitter is a use case of twitter search API.
So, I understand this question as: Should I use twitter regular search API or Streaming API?
It depends, but, long story short, if you want to see the real real-time picture - you should use streaming.
I don't have enough experience to explain you props and cons of both approaches, so I'll just refer you:
Aggregating tweets: Search API vs. Streaming API
Search API vs Streaming API
Streaming API vs Rest API?
Hope that helps.
Related
I am trying to fetch twitters data in python using tweepy, I want to set the number of tweets returned.
Here is my block of code
class StdOutListener(StreamListener):
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['hello', 'javascript', 'python'])
Tweepy provides the convenient Cursor interface to iterate through different types of objects. Twitter allows a maximum of 3200 tweets for extraction.
Set the return tweets in par page to 50:
tweets = api.search(q="place:%s" % place_id, rpp=50)
Whenever a user logs in to my application and searches I have to start a streaming API for fetching data required by him.
Here is my stream API class
import tweepy
import json
import sys
class TweetListener(tweepy.StreamListener):
def on_connect(self):
# Called initially to connect to the Streaming API
print("You are now connected to the streaming API.")
def on_error(self, status_code):
# On error - if an error occurs, display the error / status code
print('An Error has occured: ' + repr(status_code))
return False
def on_data(self, data):
json_data = json.loads(data)
print(json_data)
Here is my python code file which calls class above to start Twitter Streaming
import tweepy
from APIs.StreamKafkaApi1 import TweetListener
consumer_key = "***********"
consumer_secret = "*********"
access_token = "***********"
access_secret = "********"
hashtags = ["#ipl"]
def callStream():
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth,wait_on_rate_limit=True)
tweetListener = TweetListener(userid,projectid)
streamer = tweepy.Stream(api.auth, tweetListener)
streamer.filter(track=hashtags, async=True)
if __name__ == "__main__":
callStream()
But if I hit more than twice my application return error code 420.
I thought to change API(using multiple keys) used to fetch data whenever Error 420 occurs.
How to get error raised by the on_error method of TweetListener class in def callStream()
I would like to add onto #Andy Piper's answer. Response 420 means your script is making too many requests and has been Rate Limited. To resolve this, here is what I do(in class TweetListener):
def on_limit(self,status):
print ("Rate Limit Exceeded, Sleep for 15 Mins")
time.sleep(15 * 60)
return True
Do this and the error will be handled.
If you persist on using multiple keys. I am not sure but try exception handling on TweetListener and streamer, for tweepy.error.RateLimitError and use recursive call of the function using next API key?
def callStream(key):
#authenticate the API keys here
try:
tweetListener = TweetListener(userid,projectid)
streamer = tweepy.Stream(api.auth, tweetListener)
streamer.filter(track=hashtags, async=True)
except tweepy.TweepError as e:
if e.reason[0]['code'] == "420":
callStream(nextKey)
return True
Per the Twitter error response code documentation
Returned when an application is being rate limited for making too many
requests.
The Twitter streaming API does not support more than a couple of connections per user and IP address. It is against the Twitter Developer Policy to use multiple application keys to attempt to circumvent this and your apps could be suspended if you do.
I am relatively new to tweepy python library.
I want to be sure that my stream python script always runs on a remote server. So it would be great if someone will share the best practices on how to make it happen.
Right now I am doing it this way:
if __name__ == '__main__':
while True:
try:
# create instance of the tweepy tweet stream listener
listener = TweetStreamListener()
# set twitter keys/tokens
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
# create instance of the tweepy stream
stream = Stream(auth, listener)
stream.userstream()
except Exception as e:
print "Error. Restarting Stream.... Error: "
print e.__doc__
print e.message
time.sleep(5)
And I return False on each of the methods: on_error(), on_disconnect(), on_timeout().
So, by returning False the stream stops and then reconnects in the infinite loop.
Here's how I do mine and it's been running for almost a year, on two computers to handle the errors that stop the stream here and there.
#They don't need to be in the loop.
auth = OAuthHandler(CONSUMER_KEY, CONSUMER_SECRET)
auth.set_access_token(ACCESS_TOKEN, ACCESS_TOKEN_SECRET)
while True:
listener = TweetStreamListener()
stream = Stream(auth, listener, timeout=60)
try:
stream.userstream()
except Exception, e:
print "Error. Restarting Stream.... Error: "
print e.__doc__
print e.message
To make sure that it runs forever, you should redefine the on_error method to handle the time between reconnection attempts. Your 5 seconds sleeping will hinder your chances to a successful reconnect because Twitter will see that you tried to do it too frequently. But that's another question.
Just my two cents.
I received lots of Error 420, which was weird because I didn't ask for too much keywords to the stream API.
So I figured out that the on_data() method of the stream listener class must always return True.
Mine returned False sometimes, so tweepy cut the connection, and recreate it directly as it was in a loop, twitter didn't like it much...
I've also resolved the problem by creating new stream recursively on exceptions.
Here is my complete code. just change mytrack variable, put your keys and run it using pm2 or python.
from tweepy import OAuthHandler, Stream, StreamListener
import json
mytrack = ['netmine', 'bitkhar', 'bitcoin']
consumer_key = ""
consumer_secret = ""
access_token = ""
access_token_secret = ""
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
class StdOutListener(StreamListener):
def __init__(self, listener, track_list, repeat_times):
self.repeat_times = repeat_times
self.track_list = track_list
print('************** initialized : #', self.repeat_times)
def on_data(self, data):
print(self.repeat_times, 'tweet id : ', json.loads(data)['id'])
def on_exception(self, exception):
print('exception', exception)
new_stream(auth, self.track_list, self.repeat_times+1)
def on_error(self, status):
print('err', status)
if status == 420:
# returning False in on_data disconnects the stream
return False
def new_stream(auth, track_list, repeat_times):
listener = StdOutListener(StreamListener, track_list, repeat_times)
stream = Stream(auth, listener).filter(track=track_list, is_async=True)
new_stream(auth, mytrack, repeat_times=0)
I have the following code where I have made some amendments to the class 'CustomStreamListener':
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
for hashtag in status.entities['hashtags']:
if hashtag == 'turndownforwhat':
print(hashtag['text'])
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(locations=[-122.75,36.8,-121.75,37.8])
The bit I have added is everything within the class from the 'for' statement onwards. What I am trying to do is filter by the text values of the hashtags within text messages and then use some of the standard tweepy filters further down to filter by geolocation.
This has been built in Python 2.7. With my amendments the code does not error however it just hangs with no tweets coming through. Have I put a logical error in somewhere that I have missed?
Thanks
The code has an error in the "if hashtag" condition.
It should be:
if hashtag['text'] == 'turndownforwhat'
You may need to wait a while to find a tweet that shows up, but if you use a bigger bounding box and a trending hashtag you will see results with this modification.
I'm trying to access the Twitter stream which I had working previously while improperly using Tweepy. Now that I understand how Tweepy is intended to be used I wrote the following Stream.py module. When I run it, I get error code 401 which tells me my auth has been rejected. But I had it working earlier with the same consumer token and secret. Any ideas?
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy import TweepError
from tweepy import error
#Removed. I have real keys and tokens
consumer_key = "***"
consumer_secret = "***"
access_token="***"
access_token_secret="***"
class CustomListener(StreamListener):
""" A listener handles tweets are the received from the stream.
This is a basic listener that just prints received tweets to stdout."""
def on_status(self, status):
# Do things with the post received. Post is the status object.
print status.text
return True
def on_error(self, status_code):
# If error thrown during streaming.
# Check here for meaning:
# https://dev.twitter.com/docs/error-codes-responses
print "ERROR: ",; print status_code
return True
def on_timeout(self):
# If no post received for too long
return True
def on_limit(self, track):
# If too many posts match our filter criteria and only a subset is
# sent to us
return True
def filter(self, track_list):
while True:
try:
self.stream.filter(track=track_list)
except error.TweepError as e:
raise TweepError(e)
def go(self):
listener = CustomListener()
auth = OAuthHandler(consumer_key, consumer_secret)
self.stream = Stream(auth,listener,timeout=3600)
listener.filter(['LOL'])
if __name__ == '__main__':
go(CustomListener)
For anyone who happens to have the same issue, I should have added this line after auth was initialized:
auth.set_access_token(access_token, access_token_secret)