I am very new to python. I'm using tweepy library to scrape tweets via twitter streaming API. but it seems like connection gets broken after running for an hour. I want to know if there is any way to stop the program from running before the connections get broken. In short limiting the tweets.
I have tried the .items method but it did'nt work as it gives the name Error.
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
ckey="xxxxxxxxxxxxxxxxxxxxxxxxxxx"
csecret="xxxxxxxxxxxxxxxxxxxxxx"
atoken="xxxxxxxxxxxxxxxxxxxxx"
asecret="xxxxxxxxxxxxxxxxxxxxxxxxxxx"
class listener(StreamListener):
def on_data(self, data):
print(data)
return(True)
def on_error(self, status):
print status
auth = OAuthHandler(ckey, csecret)
auth.set_access_token(atoken, asecret)
twitterStream = Stream(auth, listener())
twitterStream.filter(track=["Obama"])
thanks
To solve your connection issue take help from this:
Tweepy Connection broken: IncompleteRead - best way to handle exception? or, can threading help avoid?
To achieve tweets limitation you can return False from the class def on_data method, when the desired number of tweets are fetched. Set max number of tweets in the init method and use try and except for error handling. This might help
def __init__(self):
super().__init__()
self.max_tweets = 10
self.tweet_count = 0
def on_data(self, data):
try:
data
except TypeError:
print(completed)
else:
self.tweet_count+=1
if(self.tweet_count==self.max_tweets):
print("completed")
return(False)
else:
decoded = json.loads(data)
Related
#!/usr/bin/env python
# twitterbots/bots/favretweet.py
import tweepy
import logging
from config import create_api
import seacret
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger()
#stream = tweepy.Stream(seacret.KEY, seacret.SECRET, seacret.TOKEN, seacret.TOKEN_SECRET)
class FavRetweetListener(tweepy.Stream):
def __init__(self, api):
self.api = api
self.user = api.get_user(screen_name='MyGasAndEnergy1')
def on_status(self, tweet):
logger.info(f"Prosessing tweet id {tweet.id}")
if tweet.in_reply_to_status_id is not None or tweet.user.id == self.user.user_id:
return
if not tweet.favorite:
try:
tweet.favorite()
except Exception as e:
logger.error("Error on Fav", exc_info=True)
if not tweet.retweeted:
try:
tweet.retweet()
except Exception as e:
logger.error("Error on vav and retweet", exc_info=True)
def on_error(self, status):
logger.error(status)
def main(keywords):
api = create_api()
tweets_listener = FavRetweetListener(api)
#new way to auth
stream = tweepy.Stream(seacret.KEY, seacret.SECRET, seacret.TOKEN, seacret.TOKEN_SECRET)
#old way to auto + important tweets_listener for actions
stream = tweepy.Stream(api.auth, tweets_listener)
stream.filter(track=keywords, languages=["en"])
if __name__ == "__main__":
main(["Python", "Tweepy"])
I have older code for editing for my use. But this part I can not figure, because of my noobines. Code is suppose to fav and retweet in twitter if it founds suitable keyword.
New code needs:
stream = tweepy.Stream(seacret.KEY, seacret.SECRET, seacret.TOKEN, seacret.TOKEN_SECRET)
Old code needs:
tweets_listener = FavRetweetListener(api)
stream = tweepy.Stream(api.auth, tweets_listener)
But new tweepy don't work with older api.auth method but want all secret tokens to be in tweepy.Stream() and that mean that I can not launch rest of my code via tweets_listener becauce it wont accept anything more.
How can I continue. I haven't found solution for this after googling or/and can not ask proper questions to move on with this problem.
Tweepy is python module/packet for working twitter-things. This script is originally from realpython.com. Problem is that I don't want to downgrade tweepy.
So I need include FavRetweetListener, but I don't have knowledge how I have to refactor code.
I switched to tweepy.Cursor and get it working. Thanks to all. Better question next time.
https://docs.tweepy.org/en/stable/v1_pagination.html#tweepy.Cursor
I'm looking for the fastest way to check if a specific user (TwitterID) has tweeted in real-time. To achieve this I have used Tweepy and the stream function, this results in a notification of the tweeted tweet in about -+5 seconds. Is there a faster way to check if someone has tweeted by using another library / requests or code optimization?
Thanks in advance.
import tweepy
TwitterID = "148137271"
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
self.me = api.me()
def on_status(self, tweet):
#Filter if ID has tweeted
if tweet.user.id_str == TwitterID:
print("Tweeted:", tweet.text)
def on_error(self, status):
print("Error detected")
print(status)
# Authenticate to Twitter
auth = tweepy.OAuthHandler("x")
auth.set_access_token("Y",
"Z")
# Create API object
api = tweepy.API(auth, wait_on_rate_limit=True,
wait_on_rate_limit_notify=True)
tweets_listener = MyStreamListener(api)
stream = tweepy.Stream(api.auth, tweets_listener)
stream.filter([TwitterID])
I'd say around 5 seconds is a reasonable latency, given that your program is not running on the same server as Twitter's core systems. You're subject to network and API latency and those things are outside of your control. There's no real way to rewrite this logic to change the time between a Tweet being posted and it reaching the API. If you think about the internal stuff going on inside Twitter itself from a Tweet being posted and it being fanned out to potentially millions of followers, the fact that the API - AT THE END OF AN UNKNOWN NETWORK CONNECTION - gets the Tweet data inside of < 5 seconds is pretty crazy in itself.
So what I want to do is live stream Tweets from Twitters API: for just the hashtag 'Brexit', only in the English language, and for a specific amount of Tweets (1k - 2k).
So far my code will live stream the Tweets, but whichever way I modify it I either end up with it ignoring the count and just streaming indefinitely, or I get errors. If I change it to only stream a specific users Tweets the count function works, but it ignores the hashtag. If I stream everything for the given hashtag it completely ignores the count. I've had a decent go at trying to fix it but am quite inexperienced and have really hit a brick wall with it.
If I could get some help with how to tick all these boxes at the same time would be much appreciated!
The code below so far will just stream 'Brexit' Tweets indefinitely so ignores the count=10
The bottom of the code is a bit of a mess due to me playing with it, apologies:
import numpy as np
import pandas as pd
import tweepy
from tweepy import API
from tweepy import Cursor
from tweepy.streaming import StreamListener
from tweepy import OAuthHandler
from tweepy import Stream
import Twitter_Credentials
import matplotlib.pyplot as plt
# Twitter client - hash out to stream all
class TwitterClient:
def __init__(self, twitter_user=None):
self.auth = TwitterAuthenticator().authenticate_twitter_app()
self.twitter_client = API(self.auth)
self.twitter_user = twitter_user
def get_twitter_client_api(self):
return self.twitter_client
# Twitter authenticator
class TwitterAuthenticator:
def authenticate_twitter_app(self):
auth = OAuthHandler(Twitter_Credentials.consumer_key, Twitter_Credentials.consumer_secret)
auth.set_access_token(Twitter_Credentials.access_token, Twitter_Credentials.access_secret)
return auth
class TwitterStreamer():
# Class for streaming and processing live Tweets
def __init__(self):
self.twitter_authenticator = TwitterAuthenticator()
def stream_tweets(self, fetched_tweets_filename, hash_tag_list):
# this handles Twitter authentication and connection to Twitter API
listener = TwitterListener(fetched_tweets_filename)
auth = self.twitter_authenticator.authenticate_twitter_app()
stream = Stream(auth, listener)
# This line filters Twitter stream to capture data by keywords
stream.filter(track=hash_tag_list)
# Twitter stream listener
class TwitterListener(StreamListener):
# This is a listener class that prints incoming Tweets to stdout
def __init__(self, fetched_tweets_filename):
self.fetched_tweets_filename = fetched_tweets_filename
def on_data(self, data):
try:
print(data)
with open(self.fetched_tweets_filename, 'a') as tf:
tf.write(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
if status == 420:
# Return false on data in case rate limit occurs
return False
print(status)
class TweetAnalyzer():
# Functionality for analysing and categorising content from tweets
def tweets_to_data_frame(self, tweets):
df = pd.DataFrame(data=[tweet.text for tweet in tweets], columns=['tweets'])
df['id'] = np.array([tweet.id for tweet in tweets])
df['len'] = np.array([len(tweet.text) for tweet in tweets])
df['date'] = np.array([tweet.created_at for tweet in tweets])
df['source'] = np.array([tweet.source for tweet in tweets])
df['likes'] = np.array([tweet.favorite_count for tweet in tweets])
df['retweets'] = np.array([tweet.retweet_count for tweet in tweets])
return df
if __name__ == "__main__":
auth = OAuthHandler(Twitter_Credentials.consumer_key, Twitter_Credentials.consumer_secret)
auth.set_access_token(Twitter_Credentials.access_token, Twitter_Credentials.access_secret)
api = tweepy.API(auth)
for tweet in Cursor(api.search, q="#brexit", count=10,
lang="en",
since="2019-04-03").items():
fetched_tweets_filename = "tweets.json"
twitter_streamer = TwitterStreamer()
hash_tag_list = ["Brexit"]
twitter_streamer.stream_tweets(fetched_tweets_filename, hash_tag_list)
You're trying to use two different methods of accessing the Twitter API - Streaming is realtime, and searching is a one-off API call.
Since streaming is continuous and realtime, there's no way to apply a count of results to it - the code simply opens a connection, says "hey, send me all the Tweets from now onwards that contain the hash_tag_list", and sits listening. At that point you then drop into the StreamListener, where for each Tweet received, you write them into a file.
You could apply a counter here, but you'd need to wrap it inside your StreamListener on_data handler, and increment the counter for each Tweet received. When you get to 1000 Tweets, stop listening.
For the search option, you have a couple of issues... the first one is that you're asking for Tweets since 2019, but the standard search API can only go back 7 days in time. You've obviously asked for only 10 Tweets there. The way you've written the method though, what's actually happening is that for each Tweet in the collection of 10 that the API returns, you then create a realtime streaming connection and start listening and writing to a file. So that's not going to work.
You'll need to choose one - either search for 1000 Tweets and write them to a file (never set up TwitterStreamer()), or, listen for 1000 Tweets and write them to a file (drop the for Tweet in Cursor(api.search... and jump straight to the streamer).
Simply add the hashtag symbol to the search phrase in the list, and it'll match tweets that use a specific hashtag. It's case-sensitive so you may want to add as many options to the search array. Merely using "Brexit" matches tweets that may or may not use the hashtag but contain the keyword "Brexit".
hash_tag_list = ["#Brexit"]
I am implementing a Twitter bot for fun purposes using Tweepy.
What I am trying to code is a bot that tracks a certain keyword and based in it the bot replies the user that tweeted with the given string.
I considered storing the Twitter's Stream on a .json file and looping the Tweet object for every user but it seems impractical as receiving the stream locks the program on a loop.
So, how could I track the tweets with the Twitter's Stream API based on a certain keyword and reply the users that tweeted it?
Current code:
from tweepy import OAuthHandler
from tweepy import Stream
from tweepy.streaming import StreamListener
class MyListener(StreamListener):
def on_data(self, data):
try:
with open("caguei.json", 'a+') as f:
f.write(data)
data = f.readline()
tweet = json.loads(data)
text = str("#%s acabou de. %s " % (tweet['user']['screen_name'], random.choice(exp)))
tweepy.API.update_status(status=text, in_reply_to_status_id=tweet['user']['id'])
#time.sleep(300)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
print(status)
return True
api = tweepy.API(auth)
twitter_stream = Stream(auth, MyListener())
twitter_stream.filter(track=['dengue']) #Executing it the program locks on a loop
Tweepy StreamListener class allows you to override it's on_data method. That's where you should be doing your logic.
As per the code
class StreamListener(object):
...
def on_data(self, raw_data):
"""Called when raw data is received from connection.
Override this method if you wish to manually handle
the stream data. Return False to stop stream and close connection.
"""
...
So in your listener, you can override this method and do your custom logic.
class MyListener(StreamListener):
def on_data(self, data):
do_whatever_with_data(data)
You can also override several other methods (on_direct_message, etc) and I encourage you to take a look at the code of StreamListener.
Update
Okay, you can do what you intent to do with the following:
class MyListener(StreamListener):
def __init__(self, *args, **kwargs):
super(MyListener, self).__init__(*args, **kwargs)
self.file = open("whatever.json", "a+")
def _persist_to_file(self, data):
try:
self.file.write(data)
except BaseException:
pass
def on_data(self, data):
try:
tweet = json.loads(data)
text = str("#%s acabou de. %s " % (tweet['user']['screen_name'], random.choice(exp)))
tweepy.API.update_status(status=text, in_reply_to_status_id=tweet['user']['id'])
self._persist_to_file(data)
return True
except BaseException as e:
print("Error on_data: %s" % str(e))
return True
def on_error(self, status):
print(status)
return True
I have found the following piece of code that works pretty well for letting me view in Python Shell the standard 1% of the twitter firehose:
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key = ""
access_secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(track=['manchester united'])
How do I add a filter to only parse tweets from a certain location? Ive seen people adding GPS to other twitter related Python code but I cant find anything specific to sapi within the Tweepy module.
Any ideas?
Thanks
The streaming API doesn't allow to filter by location AND keyword simultaneously.
Bounding boxes do not act as filters for other filter parameters. For example
track=twitter&locations=-122.75,36.8,-121.75,37.8 would match any tweets containing
the term Twitter (even non-geo tweets) OR coming from the San Francisco area.
Source: https://dev.twitter.com/docs/streaming-apis/parameters#locations
What you can do is ask the streaming API for keyword or located tweets and then filter the resulting stream in your app by looking into each tweet.
If you modify the code as follows you will capture tweets in United Kingdom, then those tweets get filtered to only show those that contain "manchester united"
import sys
import tweepy
consumer_key=""
consumer_secret=""
access_key=""
access_secret=""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_key, access_secret)
api = tweepy.API(auth)
class CustomStreamListener(tweepy.StreamListener):
def on_status(self, status):
if 'manchester united' in status.text.lower():
print status.text
def on_error(self, status_code):
print >> sys.stderr, 'Encountered error with status code:', status_code
return True # Don't kill the stream
def on_timeout(self):
print >> sys.stderr, 'Timeout...'
return True # Don't kill the stream
sapi = tweepy.streaming.Stream(auth, CustomStreamListener())
sapi.filter(locations=[-6.38,49.87,1.77,55.81])
Juan gave the correct answer. I'm filtering for Germany only using this:
# Bounding boxes for geolocations
# Online-Tool to create boxes (c+p as raw CSV): http://boundingbox.klokantech.com/
GEOBOX_WORLD = [-180,-90,180,90]
GEOBOX_GERMANY = [5.0770049095, 47.2982950435, 15.0403900146, 54.9039819757]
stream.filter(locations=GEOBOX_GERMANY)
This is a pretty crude box that includes parts of some other countries. If you want a finer grain you can combine multiple boxes to fill out the location you need.
It should be noted though that you limit the number of tweets quite a bit if you filter by geotags. This is from roughly 5 million Tweets from my test database (the query should return the %age of tweets that actually contain a geolocation):
> db.tweets.find({coordinates:{$ne:null}}).count() / db.tweets.count()
0.016668392651547598
So only 1.67% of my sample of the 1% stream include a geotag. However there's other ways of figuring out a user's location:
http://arxiv.org/ftp/arxiv/papers/1403/1403.2345.pdf
You can't filter it while streaming but you could filter it at the output stage, if you were writing the tweets to a file.
sapi.filter(track=['manchester united'],locations=['GPS Coordinates'])