Apply limits on tweets - python

I am trying to fetch twitters data in python using tweepy, I want to set the number of tweets returned.
Here is my block of code
class StdOutListener(StreamListener):
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['hello', 'javascript', 'python'])

Tweepy provides the convenient Cursor interface to iterate through different types of objects. Twitter allows a maximum of 3200 tweets for extraction.
Set the return tweets in par page to 50:
tweets = api.search(q="place:%s" % place_id, rpp=50)

Related

twitter streaming API using python 3

Is there any method to fetch tweets over a specific span of time,using twitter streaming API,in python 3? I am working on a project to fetch tweets that are dated from April 2017 to June 2017.But all I get is the real time tweets.The following is my code in python 3.6:
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
access_token = "####"
access_token_secret = "###"
consumer_key = "###"
consumer_secret = "####"
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
if __name__ == '__main__':
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
stream.filter(track=['earthquake','Mexico','2017'])
What changes shall I do in the above code?
Twitter's Streaming API returns only real-time tweets. So, the answer is no. Use Twitter's REST API -- specifically, the search/tweets endpoint -- to get historical tweets. But, that gets you only the last week's worth of tweets. To get the older tweets that you are interested in you will need to pay for Twitter's Enterprise service.

Twitter Scanner w tweepy - Python

I was just wondering if it's possible to make a scanner with tweepy - for instance, a while loop that is constantly searching for certain words. I'm a trader and would find it very useful in case there is any breaking news.
Example:
I want to set my scanner to constantly return tweets that have '$DB' in them. Furthermore, I only want to return tweets of users that have > 5k followers.
Any advice or pointers would be helpful! Thanks.
Edit/Update: As discussed by asongtoruin and qorka, the question asks for new tweets, not existing tweets. Previous edit used api.search method which finds only existing messages. The StreamListener reads new messages.
import tweepy
from tweepy import OAuthHandler
access_token='your_api_token'
access_secret='your_api_access_secret'
consumer_key = 'your_api_key'
consumer_secret = 'your_consumer_key'
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
class MyListener(StreamListener):
def on_status(self, status):
try:
if status.user.followers_count > 5000:
print '%s (%s at %s, followers: %d)' % (status.text, status.user.screen_name, status.created_at, status.user.followers_count)
return True
except BaseException as e:
print("Error on_status: %s" % str(e))
return True
def on_error(self, status):
print(status)
return True
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(track=['$DB','$MS','$C'])

Collecting URI's From Tweets

I am currently writing a python program that utilizes Tweepy & the Twitter API, and extracts URI links from tweets on twitter.
This is currently my code. How do I modify it so that it only outputs the URIs from tweets(if there is one included)?
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "-"
access_token_secret = ""
consumer_key = ""
consumer_secret = ""
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keyword: '#NFL'
twitterator = stream.filter(track=[ '#NFL' ])
for tweet in twitterator:
print "(%s) #%s %s" % (tweet["created_at"], tweet["user"]["screen_name"], tweet["text"])
for url in tweet["entities"]["urls"]:
print " - found URL: %s" % url["expanded_url"]
I've modified your code to only print URLs if present:
#Import the necessary methods from tweepy library
import json
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "-"
access_token_secret = ""
consumer_key = ""
consumer_secret = ""
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
tweet = json.loads(data)
for url in tweet["entities"]["urls"]:
print " - found URL: %s" % url["expanded_url"]
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keyword: '#NFL'
stream.filter(track=[ '#NFL' ])

Mine Tweets between two dates in Python

I would like to mine tweets for two keywords for a specific period of time. I currently have the code below, but how do I add so it only mine tweets between two dates? (10/03/2016 - 10/07/2016) Thank you!
#Import the necessary methods from tweepy library
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
#Variables that contains the user credentials to access Twitter API
access_token = "ENTER YOUR ACCESS TOKEN"
access_token_secret = "ENTER YOUR ACCESS TOKEN SECRET"
consumer_key = "ENTER YOUR API KEY"
consumer_secret = "ENTER YOUR API SECRET"
#This is a basic listener that just prints received tweets to stdout.
class StdOutListener(StreamListener):
def on_data(self, data):
print data
return True
def on_error(self, status):
print status
if __name__ == '__main__':
#This handles Twitter authetification and the connection to Twitter Streaming API
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
#This line filter Twitter Streams to capture data by the keywords: 'python', 'javascript', 'ruby'
stream.filter(track=['python', 'javascript', 'ruby'])
You can't. Have a look at this question, that is the closest you can get.
The Twitter API does not allow to search by time. Trivially, what you can do is fetching tweets and looking at their timestamps afterwards in Python, but that is highly inefficient.

GetSearch or SreamListener? python

I'm very new to twitter api, please help me understand the difference between two things.
As far as I understand I can get real-time tweets by using tweepy for example :
hashtag = ['justinbieber']
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
try:
data = status.__getstate__()
print data
output.write("%s\n "% data)
except Exception, e:
print >> sys.stderr, 'Encountered Exception:', e
pass
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
class Twitter():
def __init__(self):
consumer_key=
consumer_secret=
access_key =
access_secret =
self.auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
self.auth.set_access_token(access_key, access_secret)
self.api = tweepy.API(self.auth)
def start(self):
l = CustomStreamListener()
stream = tweepy.streaming.Stream(self.auth,l, secure=True)
stream.filter(follow=None, track=hashtag)
if __name__ == "__main__":
Twitter().start()
But what exactly I'm getting if I use python-twitter's api.GetSearch()? For example:
def t_auth(self):
consumer_key=
consumer_secret=
access_key =
access_secret =
self.api = twitter.Api(consumer_key, consumer_secret ,access_key, access_secret)
self.api.VerifyCredentials()
return self.api
self.tweets = []
self.tweets.extend(self.api.GetSearch(self.hashtag, per_page=10))
Imagine that I put last line in an infinite while loop, will I get the same result as in the first example? What's the difference between those two?
Here's my insight.
The first example with tweepy stream is a use case of twitter streaming API.
The second example using python-twitter is a use case of twitter search API.
So, I understand this question as: Should I use twitter regular search API or Streaming API?
It depends, but, long story short, if you want to see the real real-time picture - you should use streaming.
I don't have enough experience to explain you props and cons of both approaches, so I'll just refer you:
Aggregating tweets: Search API vs. Streaming API
Search API vs Streaming API
Streaming API vs Rest API?
Hope that helps.

Categories