My flask API looks like this:
#app.route('/getCuentaReferencia', methods=['GET'])
def getFollowers():
x = set()
try:
for user in tweepy.Cursor(api.get_followers, screen_name=request.args.get('cuentaReferencia')).items(int(request.args.get('cantidadCuenta'))):
x.add(user._json['screen_name'])
except Exception as e:
print(e)
pass
with open(request.args.get("cuentaReferencia") + ".json", "w") as f:
json.dump(list(x), f)
return jsonify({
"cuentaReferencia": request.args.get("cuentaReferencia") + request.args.get("region")+".json",
})
From frontend I send the screen name of the account and the quantity of followers that I'm looking for:
const response = await fetch(
`http://localhost:5000/getCuentaReferencia?cuentaReferencia=${cuentaReferencia}&cantidadCuenta=${
cantidadCuenta ?? ""
}®ion=${region}`
);
But when the Twitter API reach the limit, after waiting the cooldown it didn't continue looking for followers and only returns a few ones (if I need to get 1000, it returns me 300 because in that number of followers reaches the limit) I need to get all the count of followers that I send from frontend, how can i fix this up?
You can set the count argument of get_followers() to 200, its maximum (while default is only 20). That will allow you to get 3000 followers without reaching the rate limit.
You can read more about this argument in the Twitter documentation here.
If you want to get even more followers, you should probably use the Twitter API V2 instead.
I'm looking for the fastest way to check if a specific user (TwitterID) has tweeted in real-time. To achieve this I have used Tweepy and the stream function, this results in a notification of the tweeted tweet in about -+5 seconds. Is there a faster way to check if someone has tweeted by using another library / requests or code optimization?
Thanks in advance.
import tweepy
TwitterID = "148137271"
class MyStreamListener(tweepy.StreamListener):
def __init__(self, api):
self.api = api
self.me = api.me()
def on_status(self, tweet):
#Filter if ID has tweeted
if tweet.user.id_str == TwitterID:
print("Tweeted:", tweet.text)
def on_error(self, status):
print("Error detected")
print(status)
# Authenticate to Twitter
auth = tweepy.OAuthHandler("x")
auth.set_access_token("Y",
"Z")
# Create API object
api = tweepy.API(auth, wait_on_rate_limit=True,
wait_on_rate_limit_notify=True)
tweets_listener = MyStreamListener(api)
stream = tweepy.Stream(api.auth, tweets_listener)
stream.filter([TwitterID])
I'd say around 5 seconds is a reasonable latency, given that your program is not running on the same server as Twitter's core systems. You're subject to network and API latency and those things are outside of your control. There's no real way to rewrite this logic to change the time between a Tweet being posted and it reaching the API. If you think about the internal stuff going on inside Twitter itself from a Tweet being posted and it being fanned out to potentially millions of followers, the fact that the API - AT THE END OF AN UNKNOWN NETWORK CONNECTION - gets the Tweet data inside of < 5 seconds is pretty crazy in itself.
I'm calling Udemy external api to build a simple REST service for experimental purpose.
https://www.udemy.com/developers/affiliate/
Here is my get_all() courses method.
class Courses(object):
"""
Handles all requests related to courses.
ie; gets the courses-list, courses-detail, coursesreviews-list
"""
def __init__(self, api):
self.api = api
logger.debug("courses initialized")
def get_all(self):
page = 1
per_page = 20
while True:
res = self._get_courses(page, per_page)
if not res['results']:
break
try:
for one in res['results']:
yield one
except Exception as e: -->>>handling exception
print(e)
break
page += 1
def _get_courses_detail(self, page, per_page):
resource = "courses"
params = {'page': page, 'per_page': per_page,
# 'fields[course]': '#all'
}
res = self.api.get(resource, params)
return res
Now, is it reasonable to handle a exception(in get_all() method) assuming that there could some error in the returning data of the api?
Or handling the exception(in get_all) is not needed here and it should be handled by the calling function?
Most of the open source projects that I see don't handle this exception.
I'm sharing the opinion in this answer. So catch the exception as soon as possible and rethrow it if needed to the next layer.
With practice and experience with your code base it becomes quite easy to judge when to add additional context to errors, and where it's most sensible to actually, finally handle the errors.
Catch → Rethrow
Do this where you can usefully add more information that would save a developer having to work through all the layers to understand the problem.
Catch → Handle
Do this where you can make final decisions on what is an appropriate, but different execution flow through the software.
Catch → Error Return
I am using aiohttp session along with a semaphore within a custom class:
async def get_url(self, url):
async with self.semaphore:
async with self.session.get(url) as response:
try:
text_response = await response.text()
read_response = await response.read()
json_response = await response.json()
await asyncio.sleep(random.uniform(0.1, 0.5))
except aiohttp.client_exceptions.ContentTypeError:
json_response = {}
return {
'json': json_response,
'text': text_response,
'read': read_response,
'status': response.status,
'url': response.url,
}
I have two questions:
Is it correct/incorrect to to have multiple await statements within a single async function? I need to return both the response.text() and response.read(). However, depending on the URL, the response.json() may or may not be available so I've thrown everything into a try/except block to catch this exception.
Since I am using this function to loop through a list of different RESTful API endpoints, I am controlling the number of simultaneous requests through the semaphore (set to max of 100) but I also need to stagger the requests so they aren't log jamming the host machine. So, I thought I could accomplish this by adding an asyncio.sleep that is randomly chosen between 0.1-0.5 seconds. Is this the best way to enforce a small wait in between requests? Should I move this to the beginning of the function instead of near the end?
It is absolutely fine to have multiple awaits in one async function, as far as you know what you are awaiting for, and each of them are awaited one by one, just like the very normal sequential execution. One thing to mention about aiohttp is that, you'd better call read() first and catch UnicodeDecodeError too, because internally text() and json() call read() first and process its result, you don't want the processing to prevent returning at least read_response. You don't have to worry about read() being called multiple times, it is simply cached in the response instance on the first call.
Random stagger is an easy and effective solution for sudden traffic. However if you want to control exactly the minimum time interval between any two requests - for academic reasons, you could set up two semaphores:
def __init__(self):
# something else
self.starter = asyncio.Semaphore(0)
self.ender = asyncio.Semaphore(30)
Then change get_url() to use them:
async def get_url(self, url):
await self.starter.acquire()
try:
async with self.session.get(url) as response:
# your code
finally:
self.ender.release()
Because starter was initialized with zero, so all get_url() coroutines will block on starter. We'll use a separate coroutine to control it:
async def controller(self):
last = 0
while self.running:
await self.ender.acquire()
sleep = 0.5 - (self.loop.time() - last) # at most 2 requests per second
if sleep > 0:
await asyncio.sleep(sleep)
last = self.loop.time()
self.starter.release()
And your main program should look something like this:
def run(self):
for url in [...]:
self.loop.create_task(self.get_url(url))
self.loop.create_task(self.controller())
So at first, the controller will release starter 30 times evenly in 15 seconds, because that is the initial value of ender. After that, the controller would release starter as soon as any get_url() ends, if 0.5 seconds have passed since the last release of starter, or it will wait up to that time.
One issue here: if the URLs to fetch is not a constant list in memory (e.g. coming from network constantly with unpredictable delays between URLs), the RPS limiter will fail (starter released too early before there is actually a URL to fetch). You'll need further tweaks for this case, even though the chance of a traffic burst is already very low.
I saw in some question on Stack Exchange that the limitation can be a function of the number of requests per 15 minutes and depends also on the complexity of the algorithm, except that this is not a complex one.
So I use this code:
import tweepy
import sqlite3
import time
db = sqlite3.connect('data/MyDB.db')
# Get a cursor object
cursor = db.cursor()
cursor.execute('''CREATE TABLE IF NOT EXISTS MyTable(id INTEGER PRIMARY KEY, name TEXT, geo TEXT, image TEXT, source TEXT, timestamp TEXT, text TEXT, rt INTEGER)''')
db.commit()
consumer_key = ""
consumer_secret = ""
key = ""
secret = ""
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(key, secret)
api = tweepy.API(auth)
search = "#MyHashtag"
for tweet in tweepy.Cursor(api.search,
q=search,
include_entities=True).items():
while True:
try:
cursor.execute('''INSERT INTO MyTable(name, geo, image, source, timestamp, text, rt) VALUES(?,?,?,?,?,?,?)''',(tweet.user.screen_name, str(tweet.geo), tweet.user.profile_image_url, tweet.source, tweet.created_at, tweet.text, tweet.retweet_count))
except tweepy.TweepError:
time.sleep(60 * 15)
continue
break
db.commit()
db.close()
I always get the Twitter limitation error:
Traceback (most recent call last):
File "stream.py", line 25, in <module>
include_entities=True).items():
File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 153, in next
self.current_page = self.page_iterator.next()
File "/usr/local/lib/python2.7/dist-packages/tweepy/cursor.py", line 98, in next
data = self.method(max_id = max_id, *self.args, **self.kargs)
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 200, in _call
return method.execute()
File "/usr/local/lib/python2.7/dist-packages/tweepy/binder.py", line 176, in execute
raise TweepError(error_msg, resp)
tweepy.error.TweepError: [{'message': 'Rate limit exceeded', 'code': 88}]
For anyone who stumbles upon this on Google, tweepy 3.2+ has additional parameters for the tweepy.api class, in particular:
wait_on_rate_limit – Whether or not to automatically wait for rate limits to replenish
wait_on_rate_limit_notify – Whether or not to print a notification when Tweepy is waiting for rate limits to replenish
Setting these flags to True will delegate the waiting to the API instance, which is good enough for most simple use cases.
The problem is that your try: except: block is in the wrong place. Inserting data into the database will never raise a TweepError - it's iterating over Cursor.items() that will. I would suggest refactoring your code to call the next method of Cursor.items() in an infinite loop. That call should be placed in the try: except: block, as it can raise an error.
Here's (roughly) what the code should look like:
# above omitted for brevity
c = tweepy.Cursor(api.search,
q=search,
include_entities=True).items()
while True:
try:
tweet = c.next()
# Insert into db
except tweepy.TweepError:
time.sleep(60 * 15)
continue
except StopIteration:
break
This works because when Tweepy raises a TweepError, it hasn't updated any of the cursor data. The next time it makes the request, it will use the same parameters as the request which triggered the rate limit, effectively repeating it until it goes though.
Just replace
api = tweepy.API(auth)
with
api = tweepy.API(auth, wait_on_rate_limit=True)
If you want to avoid errors and respect the rate limit you can use the following function which takes your api object as an argument. It retrieves the number of remaining requests of the same type as the last request and waits until the rate limit has been reset if desired.
def test_rate_limit(api, wait=True, buffer=.1):
"""
Tests whether the rate limit of the last request has been reached.
:param api: The `tweepy` api instance.
:param wait: A flag indicating whether to wait for the rate limit reset
if the rate limit has been reached.
:param buffer: A buffer time in seconds that is added on to the waiting
time as an extra safety margin.
:return: True if it is ok to proceed with the next request. False otherwise.
"""
#Get the number of remaining requests
remaining = int(api.last_response.getheader('x-rate-limit-remaining'))
#Check if we have reached the limit
if remaining == 0:
limit = int(api.last_response.getheader('x-rate-limit-limit'))
reset = int(api.last_response.getheader('x-rate-limit-reset'))
#Parse the UTC time
reset = datetime.fromtimestamp(reset)
#Let the user know we have reached the rate limit
print "0 of {} requests remaining until {}.".format(limit, reset)
if wait:
#Determine the delay and sleep
delay = (reset - datetime.now()).total_seconds() + buffer
print "Sleeping for {}s...".format(delay)
sleep(delay)
#We have waited for the rate limit reset. OK to proceed.
return True
else:
#We have reached the rate limit. The user needs to handle the rate limit manually.
return False
#We have not reached the rate limit
return True
import tweepy
auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
# will notify user on ratelimit and will wait by it self no need of sleep.
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True)
I suggest you to use the new api v2 and use the Client obj with the flag wait_on_rate_limit=True the v1 will be deprecated asap
client = tweepy.Client(consumer_key=auth.consumer_key, consumer_secret=auth.consumer_secret, access_token_secret=auth.access_token_secret, access_token=auth.access_token,
bearer_token=twitter_bearer_token, wait_on_rate_limit=True)
It will be all automatic