Related
I made a twitter bot recently which gives me this error. Sometimes it runs correctly and it shows tweets with specified keywords in the console but other times it also shows the tweet itself but doesnt likes it and shows this error between one tweet and another. This is what I see (the language is portuguese, not relevant):
pois eu reclamo mesmo vcs q se fodam
429 Too Many Requests
Too Many Requests
so fumando 70 mesmo
429 Too Many Requests
Too Many Requests
The code is the following:
TWITTER_API_KEY = "XXXX"
TWITTER_API_KEY_SECRET = "XXXX"
BEARER_TOKEN = "XXXX"
TWITTER_ACCESS_TOKEN = "XXXX"
TWITTER_ACCESS_TOKEN_S = "XXXX"
client = tweepy.Client(BEARER_TOKEN, TWITTER_API_KEY, TWITTER_API_KEY_SECRET, TWITTER_ACCESS_TOKEN, TWITTER_ACCESS_TOKEN_S)
auth = tweepy.OAuth1UserHandler(TWITTER_API_KEY, TWITTER_API_KEY_SECRET, TWITTER_ACCESS_TOKEN, TWITTER_ACCESS_TOKEN_S)
api = tweepy.API(auth, wait_on_rate_limit=True)
alive()
class MyStream(tweepy.StreamingClient):
def on_tweet(self, tweet):
try:
print(tweet.text)
client.like(tweet.id)
except Exception as error:
print(error)
time.sleep(10)
stream = MyStream(bearer_token=BEARER_TOKEN)
rule = tweepy.StreamRule("(fumar OR fumando OR "
" OR fumeque OR fumo) (-is:retweet)")
stream.add_rules(rule)
stream.filter()
I tried to increase the value of the sleep function but it doesnt solve the problem. Does any one of you know what may be happening and how can I fix this?
Every request you send to a server requires effort to process, and if too many people are sending requests at the same time, the server processing the API requests basically gets DDoS'd. To prevent this, API implementors will commonly communicate that you need to slow down the rate at which you are making requests by returning a 429 error rather than the thing that was requested.
The problem you're having is that you're trying to comply with the request, but you're doing it without actually knowing how much to take your foot off the gas with your API requests, which is just leading you to hit the rate-limiter over and over again.
I want to address right off the bat that you need to be careful with this. System administrators and site reliability engineers send those 429 error codes for a reason, and when you don't comply, they have no way of knowing if you're DDoS'ing them on purpose or by accident.
That 429 error code you are getting comes with a x-rate-limit-reset header that tells you exactly how long you need to wait before sending another request. You just need to make sure you're checking the status code and the headers of each request you're making.
The issue is that the third function never seems to respond.
I haven't been able to find a reason why this happens in the telegram documentation.
Please let me know if you have this issue or seen it and know the solution.
Even a post that references an issue like this would work.
Thank you so much for the assistance.
from email import message
import os
import re
import html
import json
import telebot
import requests
import http.client
from pytube import *
from dotenv import load_dotenv
load_dotenv()
# Creating hiding, and using API Keys
API_KEY = os.getenv("API_KEY")
RAPID_KEY = os.getenv("RAPID_API")
bot = telebot.TeleBot(API_KEY)
#bot.message_handler(commands="start")
# Creating a help message for guidance on how to use bot.
def help(message):
# Trying to send help message, if unable to send, throw an error message for the user.
try:
bot.send_message(message.chat.id, "Use \"Youtube\" and the video name to search for a video.\n")
except:
bot.send_message(message.chat.id, "There was an error fetching help, the bot may be offline.\n")
# Checking data and seeing if the word "YouTube" was used in order to start the search
def data_validation(message):
query = message.text.split()
if("youtube" not in query[0].lower()): # Set flag false if regular text
return False
else:
return True
#bot.message_handler(func=data_validation)
# Searching for youtube videos
# using RAPID API
def search(message):
query = message.text.split()
# Check if data is valid, and change variable to be lowercase for easy use.
if(data_validation(message) == True and query[0].lower() == "youtube"):
try:
if(data_validation(message) == True and query[1].lower() != "-d"):
# Removing the word "YouTube" and sending the results to the YouTube search engine.
for item in query[:]:
if(item.lower() == "youtube"):
query.remove(item)
search_query = ' '.join(query)
else:
pass #If it's not term we're looking to convert, ignore it.
# RAPID API for Youtube
try:
url = "https://youtube-search-results.p.rapidapi.com/youtube-search/"
querystring = {"q":search_query}
headers = {
"X-RapidAPI-Key": RAPID_KEY,
"X-RapidAPI-Host": "youtube-search-results.p.rapidapi.com"
}
response = requests.request("GET", url, headers=headers, params=querystring) # Grabbing response information from URL
request = json.loads(response.text) # Parsing json string for python use
# Testing to see if the RAPID API service responds and is online.
if(response.status_code == 503):
# If the service is not online, let the user know.
bot.send_message(message.chat.id, f"The RAPID API service appears to be offline try back later.\n")
if(response.status_code == 429):
# If the service has reached max quota for the day, let the user know.
bot.send_message(message.chat.id, f"Max quota reached, try back in 24 hours.\n")
# Grabbing first link from json text and sending direct url and title.
first_link = str((request["items"][0]["url"]))
bot.send_message(message.chat.id, f"{first_link}\n") # Sending first link that was queried.
# If there are no results found for the requested video, sending an error message to alert the user.
except:
bot.send_message(message.chat.id, "Unable to load video.\n")
except:
pass #ignoring if not the phrase we're looking for.
def test(message):
string = message.text.split()
print(string)
if(string[0] == "test" and data_validation(message) == True):
print("This is a test and i should be printed")
bot.send_message(message.chat.id, "Test message")
# Stay alive function for bot pinging / communication
bot.infinity_polling(1440)
The first problem in your code is your first line
from email import message
You import the message from email and also pass a parameter to the data_validation function with the same name, then return False in the data_validation function. If you return false, the function never will be executed.
first give an alias to first line you imported
Try This
from email import message as msg
import os
import re
import html
import json
import telebot
import requests
import http.client
from pytube import *
from dotenv import load_dotenv
load_dotenv()
# Creating hiding, and using API Keys
API_KEY = os.getenv("API_KEY")
RAPID_KEY = os.getenv("RAPID_API")
bot = telebot.TeleBot(API_KEY)
# Creating a help message for guidance on how to use bot.
#bot.message_handler(commands=["start"])
def help(message):
# Trying to send help message, if unable to send, throw an error message for the user.
try:
bot.send_message(message.chat.id, "Use \"Youtube\" and the video name to search for a video.\n")
except:
bot.send_message(message.chat.id, "There was an error fetching help, the bot may be offline.\n")
# Checking data and seeing if the word "YouTube" was used in order to start the search
def data_validation(message):
query = message.text.split()
print(query)
if("youtube" not in query[0].lower()): # Set flag false if regular text
return False # if you return false, the function never will be executed
else:
return True
# Searching for youtube videos
# using RAPID API
#bot.message_handler(func=data_validation)
def search(message):
query = message.text.split()
print(query) # if function executed you see the query result
# Check if data is valid, and change variable to be lowercase for easy use.
if(data_validation(message) == True and query[0].lower() == "youtube"):
try:
if(data_validation(message) == True and query[1].lower() != "-d"):
# Removing the word "YouTube" and sending the results to the YouTube search engine.
for item in query[:]:
if(item.lower() == "youtube"):
query.remove(item)
search_query = ' '.join(query)
else:
pass #If it's not term we're looking to convert, ignore it.
# RAPID API for Youtube
try:
url = "https://youtube-search-results.p.rapidapi.com/youtube-search/"
querystring = {"q":search_query}
headers = {
"X-RapidAPI-Key": RAPID_KEY,
"X-RapidAPI-Host": "youtube-search-results.p.rapidapi.com"
}
response = requests.request("GET", url, headers=headers, params=querystring) # Grabbing response information from URL
request = json.loads(response.text) # Parsing json string for python use
# Testing to see if the RAPID API service responds and is online.
if(response.status_code == 503):
# If the service is not online, let the user know.
bot.send_message(message.chat.id, f"The RAPID API service appears to be offline try back later.\n")
if(response.status_code == 429):
# If the service has reached max quota for the day, let the user know.
bot.send_message(message.chat.id, f"Max quota reached, try back in 24 hours.\n")
# Grabbing first link from json text and sending direct url and title.
first_link = str((request["items"][0]["url"]))
bot.send_message(message.chat.id, f"{first_link}\n") # Sending first link that was queried.
# If there are no results found for the requested video, sending an error message to alert the user.
except:
bot.send_message(message.chat.id, "Unable to load video.\n")
except:
pass #ignoring if not the phrase we're looking for.
def test(message):
string = message.text.split()
print(string)
if(string[0] == "test" and data_validation(message) == True):
print("This is a test and i should be printed")
bot.send_message(message.chat.id, "Test message")
# Stay alive function for bot pinging / communication
bot.infinity_polling(1440)
I found that using "if name == 'main':" and keeping all the functions in "main():" as a function handler everything ran smoothly.
I'm still trying to figure out why this works.
Ok, so I have done a bunch of searching, but I think I am just banging me head against the wall. I'm having a bear of a time trying to figure this problem out, but here is what I am trying to do:
Program Overview
I am using tweepy to gather a bunch of tweets from a particular target user. Those values are stored in a MySQL database. I am storing the tweets themselves, and the tweet ID (status_id in the context of the API) so I have something to mine for later. In the context of this program, I am trying a list of users that have retweeted a particular tweet, and eventually create a list of a particular twitter user's "Top whatever Re-Tweeters." I have the ingestion portion of the target tweets problem solved, that information gets logged into the database without any issues.
Where I'm running into problems
When I query the database for the list of the tweet ID's to look up against the api for who retweeted which of those main tweets, I have issues with trying to loop through how the data comes out of the database query. Query comes out as tuples, and when I try to loop through them in a for loop, the API gets angry and kicks back:
tweepy.error.TweepError: [{'code': 34, 'message': 'Sorry, that page does not exist.'}]
My best guess at this point is that I'm not looping through this right, or I am having some issues with data types.
My Code So Far
import os
import tweepy as tw
import pandas as pd
import json
from contextlib import redirect_stdout
import mysql.connector
import time
# Define Database Connectivity
cpalsDb = mysql.connector.connect(
host="localhost",
user="vagrant",
password="password",
database="cpals"
)
# Authenticate to Twitter
consumer_key= '#######'
consumer_secret= '###########'
access_token= '###########'
access_token_secret= '#####################'
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
mycursorread = cpalsDb.cursor()
mycursorread.execute("SELECT tweetID FROM tweets")
myresult = mycursorread.fetchall()
def queryCleanup(singleColumnData):
for x in singleColumnData:
print("TWEET: ", x)
print(type(x))
for y in x:
api.retweeters(y)
print("Retweeters: ", y)
return
queryCleanup(singleColumnData=myresult)
Conclusion
My code is kind of a mess, and I am a hobbyist in Python, so loops still evade me, as does the general order of things.
Do I need separate loops? Am I just a fool? I'm essentially asking Python "Hey, pull this row from the database. For each item, tell me who retweeted that tweet, move onto the next.
Thank you in advance, I do appreciate it!
After messing around for a good long while, I managed to sort the issue out.
I changed the function to look like this:
def queryCleanup(singleColumnData):
for x in singleColumnData:
print("HEADTWEETS: ")
print(x)
for y in x:
print("RETWEET USERS: ")
retweetersList = api.retweeters(y)
print(retweetersList)
return
Called it like this:
queryCleanup(singleColumnData=myresult)
Which produced this output:
HEADTWEETS:
(1377362181499482115,)
RETWEET USERS:
[517650988, 209633971, 2998940037, 557773370, 331270224, 482554207, 2530841186, 1236678962899824640, 27342363, 4134412067, 378132278, 289881627, 1014962042980225027, 2294193683, 147233205, 1156431, 1318958426744082432, 942782498932711424, 343431172, 560821229, 1317232193228464128, 178063571, 826619148, 936706135, 325340700, 840326004, 587987252,
1068897042297208832, 115775544, 268810091, 185780563, 43137775, 479224989, 376937152, 36946774, 20952823, 1017105484678123526, 42400785, 23933931, 17428614, 2959876373, 39891241, 19277857, 47416273, 20458833, 478719002, 340770461, 1364297002016768009, 1195727038964879362, 1184079800, 741307184500219904, 83186542, 829432901439741952, 1480959499, 1290009503824568320, 3084709625, 1373237390152716288, 720819722787364864, 57443302, 489979941, 152252818, 209327707, 346107530, 384128700, 1077705300650614785, 419288157, 833776431528284161, 97000212, 30498849, 796894834002980864, 76444062, 39250142, 1271237346143416321, 60077913, 2815794381, 1356329370751791106, 31054493, 3025642511, 598757743, 1000154893200576512, 826903009884004352, 94908878, 1028095152940883968, 42491102, 999904518723653633, 2404795357, 1212825449958539264]
So now it loops through the Tweet ID of each tweet that i have in my database, then it produces all the retweet users of each individual tweet.
My full working example looks like this:
import os
import tweepy as tw
import pandas as pd
import json
from contextlib import redirect_stdout
import mysql.connector
import time
cpalsDb = mysql.connector.connect(
host="localhost",
user="vagrant",
password="password",
database="cpals"
)
# Authenticate to Twitter
consumer_key=
consumer_secret=
access_token=
access_token_secret=
auth = tw.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tw.API(auth, wait_on_rate_limit=True)
# Grab a list of tweets that we need to action through...
mycursorread = cpalsDb.cursor()
mycursorread.execute("SELECT tweetID FROM tweets")
myresult = mycursorread.fetchall()
cpalsDb.commit()
def queryCleanup(singleColumnData):
for x in singleColumnData:
print("HEADTWEETS: ")
print(x)
for y in x:
print("RETWEET USERS: ")
retweetersList = api.retweeters(y)
print(retweetersList)
return
queryCleanup(singleColumnData=myresult)
My script has a way to go still, so it's still a work in progress - but thanks to everyone who looked at this!
I am coding a program to get the followers from a given user, then use the follower's list to get their followers and so on. The code I have so far is as I show below:
import tweepy
import time
#insert your Twitter keys here
consumer_key ='key'
consumer_secret='secret'
access_token='accesstoken'
access_secret='accesssecret'
auth = tweepy.auth.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
api = tweepy.API(auth)
list= open('/home/acrocephalus/GitHub/TweetViz/list.txt','w')
if(api.verify_credentials):
print '-------------------------\n*** You are logged in ***\n-------------------------'
#Set starting Twitter handle
username = ['moixera']
user = tweepy.Cursor(api.followers, screen_name=username).items()
#Set the number of levels to follow
depth=3
#Start extracting each level followers
while depth != 0:
for handle in username:
print '\n\n Getting followers from: #' + handle+'\n\n'
user = tweepy.Cursor(api.followers, screen_name=handle).items()
while True:
try:
u = next(user)
list.write(u.screen_name +'\n')
print u.screen_name
except:
time.sleep(15*60)
print 'We have exceeded the rate limit. Sleeping for 15 minutes'
u = next(user)
list.write(u.screen_name +'\n')
print u.screen_name
username = list.read().splitlines()
print 'Moving to next username'
depth = depth-1
list.close()
The problem is that it starts with the first user, gets her followers but doesn't continue with her followers list. I think that the problem is in the while loop. When it finishes getting the followers it jumps to the except part. The desired behaviour would be that when it has finished retrieving followers it jumps to the beginning of the for loop. The program should jumpt to the except part of the while loop when it reaches the Twitter's API hit limit and thus times out for 15 minutes. Can anyone help?
Cheers!
Dani
Use a for loop instead of the while loop:
user_list = open('/home/acrocephalus/GitHub/TweetViz/list.txt','w')
for user in tweepy.Cursor(api.followers, screen_name=handle).items():
user_list.write(user.screen_name +'\n')
print user.screen_name
N.B. don't use list as a variable name because it hides the list builtin.
I think that the API has some support for rate limiting, although I don't see it detailed in the documentation. You can enable it when initialising with tweepy.API(), see wait_on_rate_limit and wait_on_rate_limit_notify:
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
A very quick glance at the source code suggests that the API will figure out an appropriate waiting period based on headers returned from Twitter, e.g. x-rate-limit-reset, but I have not used this API so I can't be sure whether it works.
There are other problems with your code, however, these go beyond your question.
While running this program to retrieve Twitter data using Python 2.7.8 :
#imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
#setting up the keys
consumer_key = '…………...'
consumer_secret = '………...'
access_token = '…………...'
access_secret = '……………..'
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
#This is a basic listener that just prints received tweets to standard output
def on_data(self, data):
print (data)
return True
def on_error(self, status):
print (status)
#printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
stream = Stream(auth, TweetListener())
t = u"سوريا"
stream.filter(track=[t])
after running this program for 5 hours i got this Error message:
Traceback (most recent call last):
File "/Users/Mona/Desktop/twitter.py", line 32, in <module>
stream.filter(track=[t])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 316, in filter
self._start(async)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 237, in _start
self._run()
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 173, in _run
self._read_loop(resp)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/tweepy/streaming.py", line 225, in _read_loop
next_status_obj = resp.read( int(delimited_string) )
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 543, in read
return self._read_chunked(amt)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 612, in _read_chunked
value.append(self._safe_read(chunk_left))
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py", line 660, in _safe_read
raise IncompleteRead(''.join(s), amt)
IncompleteRead: IncompleteRead(0 bytes read, 976 more expected)
>>>
Actually i don't know what to do with this problem !!!
You should check to see if you're failing to process tweets quickly enough using the stall_warnings parameter.
stream.filter(track=[t], stall_warnings=True)
These messages are handled by Tweepy (check out implementation here) and will inform you if you're falling behind. Falling behind means that you're unable to process tweets as quickly as the Twitter API is sending them to you. From the Twitter docs:
Setting this parameter to the string true will cause periodic messages to be delivered if the client is in danger of being disconnected. These messages are only sent when the client is falling behind, and will occur at a maximum rate of about once every 5 minutes.
In theory, you should receive a disconnect message from the API in this situation. However, that is not always the case:
The streaming API will attempt to deliver a message indicating why a stream was closed. Note that if the disconnect was due to network issues or a client reading too slowly, it is possible that this message will not be received.
The IncompleteRead could also be due to a temporary network issue and may never happen again. If it happens reproducibly after about 5 hours though, falling behind is a pretty good bet.
I've just had this problem. The other answer is factually correct, in that it's almost certainly:
Your program isn't keeping up with the stream
you get a stall warning if that's the case.
In my case, I was reading the tweets into postgres for later analysis, across a fairly dense geographic area, as well as keywords (London, in fact, and about 100 keywords). It's quite possible that, even though you're just printing it, your local machine is doing a bunch of other things, and system processes get priority, so the tweets will back up until Twitter disconnects you. (This is typically manifests as an apparent memory leak - the program increases in size until it gets killed, or twitter disconnects - whichever is first.)
The thing that made sense here was to push off the processing to a queue. So, I used a redis and django-rq solution - it took about 3 hours to implement on dev and then my production server, including researching, installing, rejigging existing code, being stupid about my installation, testing, and misspelling things as I went.
Install redis on your machine
Start the redis server
Install Django-RQ (or just Install RQ if you're working solely in python)
Now, in your django directory (where appropriate - ymmv for straight python applications) run:
python manage.py rqworker &
You now have a queue! You can add jobs to that like by changing your handler like this:
(At top of file)
import django_rq
Then in your handler section:
def on_data(self, data):
django_rq.enqueue(print, data)
return True
As an aside - if you're interested in stuff emanating from Syria, rather than just mentioning Syria, then you could add to the filter like this:
stream.filter(track=[t], locations=[35.6626, 32.7930, 42.4302, 37.2182]
That's a very rough geobox centred on Syria, but which will pick up bits of Iraq/Turkey around the edges. Since this is an optional extra, it's worth pointing this out:
Bounding boxes do not act as filters for other filter parameters. For
example track=twitter&locations=-122.75,36.8,-121.75,37.8 would match
any tweets containing the term Twitter (even non-geo tweets) OR coming
from the San Francisco area.
From this answer, which helped me, and the twitter docs.
Edit: I see from your subsequent posts that you're still going down the road of using Twitter API, so hopefully you got this sorted anyway, but hopefully this will be useful for someone else! :)
This worked for me.
l = StdOutListener()
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
stream = Stream(auth, l)
while True:
try:
stream.filter(track=['python', 'java'], stall_warnings=True)
except (ProtocolError, AttributeError):
continue
A solution is restarting the stream immediately after catching exception.
# imports
from tweepy import Stream
from tweepy import OAuthHandler
from tweepy.streaming import StreamListener
# setting up the keys
consumer_key = "XXXXX"
consumer_secret = "XXXXX"
access_token = "XXXXXX"
access_secret = "XXXXX"
# printing all the tweets to the standard output
auth = OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_secret)
class TweetListener(StreamListener):
# A listener handles tweets are the received from the stream.
# This is a basic listener that just prints received tweets to standard output
def on_data(self, data):
print(data)
return True
def on_exception(self, exception):
print('exception', exception)
start_stream()
def on_error(self, status):
print(status)
def start_stream():
stream = Stream(auth, TweetListener())
t = u"سوريا"
stream.filter(track=[t])
start_stream()
For me the back end application to which the URL is pointing is directly returning the string
I changed it to
return Response(response=original_message, status=200, content_type='application/text')
in the start I just returned text like
return original_message
I think this answer works only for my case