Tweepy OpenSSL.SSL.WantReadError - python

Python 3.6. I use the streamer of tweepy to get tweets. It works well. But sometimes, if I let it open for more than 24h, I have this error
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\contrib\pyopenssl.py", line 277, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1547, in recv_into
self._raise_ssl_error(self._ssl, result)
File "C:\ProgramData\Anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1353, in _raise_ssl_error
raise WantReadError()
OpenSSL.SSL.WantReadError
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\contrib\pyopenssl.py", line 277, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1547, in recv_into
self._raise_ssl_error(self._ssl, result)
File "C:\ProgramData\Anaconda3\lib\site-packages\OpenSSL\SSL.py", line 1370, in _raise_ssl_error
raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (10054, 'WSAECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\response.py", line 302, in _error_catcher
yield
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\response.py", line 384, in read
data = self._fp.read(amt)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 449, in read
n = self.readinto(b)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 483, in readinto
return self._readinto_chunked(b)
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 578, in _readinto_chunked
chunk_left = self._get_chunk_left()
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 546, in _get_chunk_left
chunk_left = self._read_next_chunk_size()
File "C:\ProgramData\Anaconda3\lib\http\client.py", line 506, in _read_next_chunk_size
line = self.fp.readline(_MAXLINE + 1)
File "C:\ProgramData\Anaconda3\lib\socket.py", line 586, in readinto
return self._sock.recv_into(b)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\contrib\pyopenssl.py", line 293, in recv_into
return self.recv_into(*args, **kwargs)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\contrib\pyopenssl.py", line 282, in recv_into
raise SocketError(str(e))
OSError: (10054, 'WSAECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\ProgramData\Anaconda3\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "C:\ProgramData\Anaconda3\lib\threading.py", line 864, in run
self._target(*self._args, **self._kwargs)
File "twitter_aspi_v0.8.py", line 179, in _init_stream
tweepy.Stream(auth, listener).userstream()
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 396, in userstream
self._start(async)
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 363, in _start
self._run()
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 296, in _run
raise exception
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 265, in _run
self._read_loop(resp)
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 315, in _read_loop
line = buf.read_line().strip()
File "C:\ProgramData\Anaconda3\lib\site-packages\tweepy\streaming.py", line 180, in read_line
self._buffer += self._stream.read(self._chunk_size)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\response.py", line 401, in read
raise IncompleteRead(self._fp_bytes_read, self.length_remaining)
File "C:\ProgramData\Anaconda3\lib\contextlib.py", line 100, in __exit__
self.gen.throw(type, value, traceback)
File "C:\ProgramData\Anaconda3\lib\site-packages\requests\packages\urllib3\response.py", line 320, in _error_catcher
raise ProtocolError('Connection broken: %r' % e, e)
requests.packages.urllib3.exceptions.ProtocolError: ('Connection broken: OSError("(10054, \'WSAECONNRESET\')",)', OSError("(10054, 'WSAECONNRESET')",))
My code is pretty long and regarding the error, it seems it comes from urllib3, OpenSSL and tweepy way of accessing the Twitter API. So I could handle this with a try before launching the streamer but I would like to know if there is maybe a better fix I could do to understand and avoid this? Thanks!

This looks more like a temporary connection timeout which is not handled by Tweepy, so you should just write a wrapper around the same and catch the exception and restart it. I don't think the exception can be avoided as such because you connect to an external site and sometimes it could timeout
You should look at this http://docs.tweepy.org/en/v3.5.0/streaming_how_to.html#handling-errors for the error handling part and see if on_error gets called in your case when connection timeout happens
class MyStreamListener(tweepy.StreamListener):
def on_error(self, status_code):
if status_code == 420:
#returning False in on_data disconnects the stream
return False
If this doesn't help then use the wrapper approach

According to Twitter Developer documentation : rate limiting this is expected to connection reset/failure when you cross your usage limit.
Requests / 15-min window (user auth) = 900
Requests / 15-min window (app auth) = 1500
Also it clearly states as below.
If the initial reconnect attempt is unsuccessful, your client should continue attempting to reconnect using an exponential back-off pattern until it successfully reconnects.
(Update)
Regardless of how your client gets disconnected, you should configure
your app to reconnect immediately. If your first reconnection attempt
is unsuccessful, we recommend that your app implement an exponential
back-off pattern in subsequent reconnection attempts (e.g. wait 1
second, then 2 seconds, then 4, 8, 16, etc), with some reasonable
upper limit. If this upper limit is reached, you should configure your
client to notify your team so that you can investigate further.
The standard (free) Twitter APIs i.e Tweepy API consist of a REST API and a Streaming API. The Streaming API provides low-latency access to Tweets. Ads API has other limits when white listed.
REST API Limit
Clients may access a theoretical maximum of 3,200 statuses via the
page and count parameters for the user_timeline REST API methods.
Other timeline methods have a theoretical maximum of 800 statuses.
Requests for more than the limit will result in a reply with a status
code of 200 and an empty result in the format requested. Twitter still
maintains a database of all the Tweets sent by a user. However, to
ensure performance, this limit is in place on the API calls.
Could be simple reason that users should not spam that this thing is enforced.
Solution :
You may catch exception and re-establish connection to Twitter and continue reading tweets.
There is no alternative than getting better usage allowance from Twitter as of now unfortunately.

Related

Heroku python app (telegram bot) crashes monthly with requests exception

I have two telegram bots made with PyTelegramBotApi (not the best library, I know) deployed on Heroku. One big difference that matters: one is running with Threading to periodically send notifications to "subscribers" (around 20 people). And it crashes approximately every month with following exception trace in Heroku console:
2022-01-14T08:00:11.068553+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/telebot/apihelper.py", line 139, in _make_request
2022-01-14T08:00:11.068739+00:00 app[worker.1]: result = _get_req_session().request(
2022-01-14T08:00:11.068748+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 542, in request
2022-01-14T08:00:11.068983+00:00 app[worker.1]: resp = self.send(prep, **send_kwargs)
2022-01-14T08:00:11.068994+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/requests/sessions.py", line 655, in send
2022-01-14T08:00:11.069233+00:00 app[worker.1]: r = adapter.send(request, **kwargs)
2022-01-14T08:00:11.069242+00:00 app[worker.1]: File "/app/.heroku/python/lib/python3.9/site-packages/requests/adapters.py", line 529, in send
2022-01-14T08:00:11.069448+00:00 app[worker.1]: raise ReadTimeout(e, request=request)
2022-01-14T08:00:11.069478+00:00 app[worker.1]: requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.telegram.org', port=443): Read timed out. (read timeout=25)
I didn't record previous exception traces (my bad), but I can remember it looking pretty much the same (with the most recent exception being ReadTimeout). Moreover, any of my scripts were never seen in this trace (and this makes me feel it's incomplete due to some Heroku log console bug). I myself don't use requests module directly. Also it definitely can't be caused by dyno sleep as I use only worker dyno (it never sleeps). Both this bot and the one not crashing are using Heroku Postgres.
Any ideas on what could be causing this exception?
EDIT:
full exception trace I accidentally got in debug mode while testing somethin else on my machine:
Traceback (most recent call last):
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 445, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 440, in _make_request
httplib_response = conn.getresponse()
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\http\client.py", line 1347, in getresponse
response.begin()
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\http\client.py", line 307, in begin
version, status, reason = self._read_status()
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\http\client.py", line 268, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\socket.py", line 704, in readinto
return self._sock.recv_into(b)
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "C:\Users\vva07\AppData\Local\Programs\Python\Python39-32\lib\ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
socket.timeout: The read operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Deadliner0307\lib\site-packages\requests\adapters.py", line 439, in send
resp = conn.urlopen(
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 755, in urlopen
retries = retries.increment(
File "D:\Deadliner0307\lib\site-packages\urllib3\util\retry.py", line 532, in increment
raise six.reraise(type(error), error, _stacktrace)
File "D:\Deadliner0307\lib\site-packages\urllib3\packages\six.py", line 770, in reraise
raise value
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 699, in urlopen
httplib_response = self._make_request(
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 447, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "D:\Deadliner0307\lib\site-packages\urllib3\connectionpool.py", line 336, in _raise_timeout
raise ReadTimeoutError(
urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='api.telegram.org', port=443): Read timed out. (read timeout=25)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\Users\vva07\OneDrive\Документы\проекты\Deadliner0307\main.py", line 450, in <module>
bot.polling(none_stop=True, interval=1)
File "C:\Users\vva07\OneDrive\Документы\проекты\Deadliner0307\main.py", line 446, in deadliner0307
else:
File "D:\Deadliner0307\lib\site-packages\telebot\__init__.py", line 633, in polling
self.__threaded_polling(non_stop, interval, timeout, long_polling_timeout, allowed_updates)
File "D:\Deadliner0307\lib\site-packages\telebot\__init__.py", line 692, in __threaded_polling
raise e
File "D:\Deadliner0307\lib\site-packages\telebot\__init__.py", line 654, in __threaded_polling
polling_thread.raise_exceptions()
File "D:\Deadliner0307\lib\site-packages\telebot\util.py", line 100, in raise_exceptions
raise self.exception_info
File "D:\Deadliner0307\lib\site-packages\telebot\util.py", line 82, in run
task(*args, **kwargs)
File "D:\Deadliner0307\lib\site-packages\telebot\__init__.py", line 391, in __retrieve_updates
updates = self.get_updates(offset=(self.last_update_id + 1),
File "D:\Deadliner0307\lib\site-packages\telebot\__init__.py", line 371, in get_updates
json_updates = apihelper.get_updates(self.token, offset, limit, timeout, allowed_updates, long_polling_timeout)
File "D:\Deadliner0307\lib\site-packages\telebot\apihelper.py", line 312, in get_updates
return _make_request(token, method_url, params=payload)
File "D:\Deadliner0307\lib\site-packages\telebot\apihelper.py", line 139, in _make_request
result = _get_req_session().request(
File "D:\Deadliner0307\lib\site-packages\requests\sessions.py", line 542, in request
resp = self.send(prep, **send_kwargs)
File "D:\Deadliner0307\lib\site-packages\requests\sessions.py", line 655, in send
r = adapter.send(request, **kwargs)
File "D:\Deadliner0307\lib\site-packages\requests\adapters.py", line 529, in send
raise ReadTimeout(e, request=request)
requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='api.telegram.org', port=443): Read timed out. (read timeout=25)
I have found a way to at least keep the bot working. Be careful: this may not fit your application (in your case restarting straight up after a specific exception may corrupt data or break something else). In main file, I made this construction, which basically restarts a bot after an unhandled exception, which would otherwise just stop the bot:
def exception_handler(count: int = 0):
"""Relaunching bot unless exceptions occur more than 2 times a day
(script is reset daily on Heroku)"""
if count < 3:
if count > 0:
print("An exception occurred, relaunching . . .")
time.sleep(5)
try:
deadliner0307() # bot main function with (bot.polling starts there)
except Exception as ex:
count += 1
# Notifying myself about exception via free Airbrake addon:
notifier = pybrake.Notifier(project_id=399289,
project_key='129d3450356965175fda762b69e1babf',
environment='production')
notifier.notify(ex)
exception_handler(count)
else:
print("Too much exceptions occurred, shutting down . . .")
if __name__ == '__main__':
exception_handler()

"chunkedEncodingError, Connection broken: OSError 104" when transferring ~20 tables from BigQuery into MongoDB (via Python)

To preface the question, I am not sure if this is an issue with:
(a) Airflow and the GCP compute engine that it is deployed / running on,
(b) Python's connection to BigQuery (using the BigQuery API client library for python)
(c) Python's connection to MongoDB (using pymongo / MongoClient)
... and so tagging the question accurately was difficult. To summarize quickly, we have an Airflow DAG with 20 tasks, where each task queries data from a single BigQuery table into a pandas dataframe, then cleans up the data quickly, and then inserts the data into a MongoDB collection. Each task is running this same function for a different table_name from BigQuery:
def transfer_all_by_partitioned_key(table_name, partition_key):
start_time = time.time()
print(partition_key)
# (1) Connect to BigQuery + Mongo DB
bq = bigquery.Client()
cluster = MongoClient(MONGO_URI)
db = cluster["mydb"]
print(f'====== (1) Connected to BQ + Mongo: {round(time.time() - start_time, 5)}')
# (2) Create list of options, in order to transfer data in chunks
get_partitions_query = f"select distinct {partition_key} from myprojectid.mydataset.{table_name}"
partitions_df = bq.query(get_partitions_query).to_dataframe()
# (3) Loop DF, query + write to Mongo for each partition
db[table_name].drop() # drop table, as we are recreating it entirely here
for index, row in partitions_df.iterrows():
# Run Query
iter_value = row[partition_key]
iter_query = f"select * from `myprojectid.mydataset.{table_name}` where {partition_key} = {iter_value}"
iter_df = bq.query(iter_query).to_dataframe()
print(f'====== {index}: ({iter_value}) {iter_df.shape[0]} rows updated on this date')
# Handle Dates
row1_types = iter_df.iloc[0].apply(type) # missing data in 1st row may be issue
date_cols = [key for key in dict(row1_types) if row1_types[key] == datetime.date]
for col in date_cols:
iter_df[[col]] = iter_df[[col]].astype(str) # .where(iter_df[[col]].notnull(), None)
# Handle DateTimes
datetime_cols = [key for key in dict(iter_df.dtypes) if is_datetime(iter_df[key])]
for col in datetime_cols:
iter_df[[col]] = iter_df[[col]].astype(object) # .where(iter_df[[col]].notnull(), None)
# Insert into MongoDB
db[table_name].insert_many(iter_df.to_dict('records'))
Because of the chunking / looping (to avoid memory issues in Airflow), each task can query from BigQuery + write to MongoDB up to 500 times. As for the issue - seemingly randomly, I receive the following error from time to time for these tasks:
requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
This has happened at different times for 3 of the 20 tasks, and it seems like this error is prone to happen for any task at any time... however, when the task is cleared and rerun in Airflow, it generally works the second time (not so reproducible).
Maybe I am querying from BigQuery, or inserting into MongoDB, too frequently?
Here is the full error from the Airflow logs:
[2020-07-29 22:39:20,735] {{taskinstance.py:1145}} ERROR - ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 313, in recv_into
return self.connection.recv_into(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1840, in recv_into
self._raise_ssl_error(self._ssl, result)
File "/usr/local/lib/python3.7/site-packages/OpenSSL/SSL.py", line 1663, in _raise_ssl_error
raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (104, 'ECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 437, in _error_catcher
yield
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 764, in read_chunked
self._update_chunk_length()
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 694, in _update_chunk_length
line = self._fp.fp.readline()
File "/usr/local/lib/python3.7/socket.py", line 589, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.7/site-packages/urllib3/contrib/pyopenssl.py", line 318, in recv_into
raise SocketError(str(e))
OSError: (104, 'ECONNRESET')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 751, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 572, in stream
for line in self.read_chunked(amt, decode_content=decode_content):
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 793, in read_chunked
self._original_response.close()
File "/usr/local/lib/python3.7/contextlib.py", line 130, in __exit__
self.gen.throw(type, value, traceback)
File "/usr/local/lib/python3.7/site-packages/urllib3/response.py", line 455, in _error_catcher
raise ProtocolError("Connection broken: %r" % e, e)
urllib3.exceptions.ProtocolError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/airflow/models/taskinstance.py", line 983, in _run_raw_task
result = task_copy.execute(context=context)
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 113, in execute
return_value = self.execute_callable()
File "/usr/local/lib/python3.7/site-packages/airflow/operators/python_operator.py", line 118, in execute_callable
return self.python_callable(*self.op_args, **self.op_kwargs)
File "/usr/local/airflow/plugins/tasks/geniussports/transfers.py", line 223, in transfer_all_by_partitioned_key
iter_df = bq.query(iter_query).to_dataframe()
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/job.py", line 3374, in to_dataframe
create_bqstorage_client=create_bqstorage_client,
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1741, in to_dataframe
for frame in self.to_dataframe_iterable(dtypes=dtypes):
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1435, in _to_page_iterable
for item in tabledata_list_download():
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/_pandas_helpers.py", line 563, in download_dataframe_tabledata_list
for page in pages:
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 249, in _page_iter
page = self._next_page()
File "/usr/local/lib/python3.7/site-packages/google/api_core/page_iterator.py", line 369, in _next_page
response = self._get_next_page_response()
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/table.py", line 1367, in _get_next_page_response
method=self._HTTP_METHOD, path=self.path, query_params=params
File "/usr/local/lib/python3.7/site-packages/google/cloud/bigquery/client.py", line 556, in _call_api
return call()
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 286, in retry_wrapped_func
on_error=on_error,
File "/usr/local/lib/python3.7/site-packages/google/api_core/retry.py", line 184, in retry_target
return target()
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 419, in api_request
timeout=timeout,
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 277, in _make_request
method, url, headers, data, target_object, timeout=timeout
File "/usr/local/lib/python3.7/site-packages/google/cloud/_http.py", line 315, in _do_request
url=url, method=method, headers=headers, data=data, timeout=timeout
File "/usr/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 450, in request
**kwargs
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 530, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 683, in send
r.content
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 829, in content
self._content = b''.join(self.iter_content(CONTENT_CHUNK_SIZE)) or b''
File "/usr/local/lib/python3.7/site-packages/requests/models.py", line 754, in generate
raise ChunkedEncodingError(e)
requests.exceptions.ChunkedEncodingError: ('Connection broken: OSError("(104, \'ECONNRESET\')")', OSError("(104, 'ECONNRESET')"))

Understanding raise RemoteDisconnected("Remote end closed connection"

I'm scraping twitter trying to get the friends/users being followed for a list of twitter users. I'm using tweepy and python 3.6.5 on OSX 10.13. An abbreviated code chunk :
def get_friends_for_each_twitter_user(UserL=None, Name=None):
.
. # Auth keys and such
.
for user in UserL: ### This is a list of USER class with the below fields ###
### Handle protected users ###
if(user.protected == True):
user.friendsL = "protected"
continue
screenNameL=[]
friendIDL=[]
friendL=[]
friendScreenNameL=[]
### Get IDs of people that this user follows (i.e. 'friends') ###
for page in tweepy.Cursor(api.friends_ids, screen_name=user.screenName).pages():
friendIDL.extend(page)
time.sleep(60)
## Loop through IDs, get user profile, keep only friends' screen name ###
for i in range(0, len(friendIDL), 100):
friendL.extend(api.lookup_users(user_ids=friendIDL[i:i+100]))
### Keep only screen name ###
for friend in friendL:
friendScreenNameL.append(friend._json['screen_name'])
user.friendsL = friendScreenNameL
When I do this, after collecting the friends (i.e. profiles that the user follows) for about a dozen users, I get the following errors:
Traceback (most recent call last):
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/util/retry.py", line 357, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/packages/six.py", line 685, in reraise
raise value.with_traceback(tb)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 387, in _make_request
six.raise_from(e, None)
File "<string>", line 2, in raise_from
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/urllib3/connectionpool.py", line 383, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 1331, in getresponse
response.begin()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 297, in begin
version, status, reason = self._read_status()
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/http/client.py", line 266, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
urllib3.exceptions.ProtocolError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/tweepy/binder.py", line 190, in execute
proxies=self.api.proxy)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pdb.py", line 1667, in main
pdb._runscript(mainpyfile)
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/pdb.py", line 1548, in _runscript
self.run(statement)
File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/bdb.py", line 434, in run
exec(cmd, globals, locals)
File "<string>", line 1, in <module>
File "/Users/myusername/Code/Python/hair_prod/src/main.py", line 170, in <module>
main()
File "/Users/myusername/Code/Python/hair_prod/src/main.py", line 141, in main
get_friends_for_each_twitter_user(UserL=tresemmeUserL, Name="Tresemme")
File "src/twitter_scraper.py", line 187, in get_friends_for_each_twitter_user
friendL.extend(api.lookup_users(user_ids=friendIDL[i:i+100]))
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/tweepy/api.py", line 336, in lookup_users
return self._lookup_users(post_data=post_data)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/tweepy/binder.py", line 250, in _call
return method.execute()
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/tweepy/binder.py", line 192, in execute
six.reraise(TweepError, TweepError('Failed to send request: %s' % e), sys.exc_info()[2])
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/six.py", line 692, in reraise
raise value.with_traceback(tb)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/tweepy/binder.py", line 190, in execute
proxies=self.api.proxy)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/adapters.py", line 490, in send
raise ConnectionError(err, request=request)
tweepy.error.TweepError: Failed to send request: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response',))
Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /Users/myusername/.local/virtualenvs/python3.6/lib/python3.6/site-packages/requests/adapters.py(490)send()
-> raise ConnectionError(err, request=request)
It appears that the errors occur on line friendL.extend(api.lookup_users(user_ids=friendIDL[i:i+100])) in the
get_friends_for_each_twitter_user() function
QUESTION :
Why is this error occurring?
How do I avoid/work around it?
Any number of things could cause the error to appear, but if the cause is not permanent, then retrying an occasional failed API call could make the script work alright.
According to the Tweepy docs the API client constructor accepts a retry_count parameter which defaults to 0. Try setting retry_count to something above 0 and see if your script is able to complete successfully, something like this:
api = tweepy.api.API(..., retry_count=3)
After messing with this for a while, I believe that this issue was caused by my network connection. When this happened, I was connected to my 5GHz wireless network. When I connected to my 2.4GHz wireless network, these errors are less frequent. The proper thing to do in this situation is to handle the exception, wait a few seconds and try again. Below is the appropriate code fragment:
def get_friends_for_each_twitter_user(UserL=None, Name=None):
consumerKey = #your value here
consumerSecret = #your value here
auth = tweepy.AppAuthHandler(consumerKey, consumerSecret) ### Supposedly faster
api = tweepy.API(auth, wait_on_rate_limit=True, wait_on_rate_limit_notify=True) ## Now I don't have to handle rate limiting myself
for user in UserL:
accountStatus = 'active'
if(user.protected == True):
user.friendsL = "protected"
continue
screenNameL=[]
friendIDL=[]
friendL=[]
friendScreenNameL=[]
#### TWITTER LIMITS US #####
try :
for page in tweepy.Cursor(api.friends_ids, screen_name=user.screenName).pages():
friendIDL.extend(page)
except tweepy.TweepError as error :
if(error.__dict__['api_code'] == 34):
accountStatus = 'dead'
print("...{} is dead".format(user.screenName))
continue
else:
raise
for i in range(0, len(friendIDL), 100):
### This handles when exception occurs (probably due to connection issues)
### When exception occurs, sleeps then retries. I don't notice this error
### when I'm running on corporate Wifi, maybe my router just sucks
while True:
try :
friendL.extend(api.lookup_users(user_ids=friendIDL[i:i+100]))
except tweepy.TweepError as error :
print("...Exception for {} : api_code {}".format(user.screenName,
error.__dict__['api_code']))
time.sleep(5)
continue
break

How do I increase the timeout for imaplib requests?

I'm using imaplib to query Gmail's IMAP, but some requests are taking more than 60 seconds to return. This is already done in a task, so I have a full 10 minutes to do the request, but my tasks are failing due to the 60 second limit on urlfetch.
I've tried setting urlfetch.set_default_fetch_deadline(600), but it doesn't seem to do anything.
Here's a stacktrace:
The API call remote_socket.Receive() took too long to respond and was cancelled.
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 760, in uid
typ, dat = self._simple_command(name, command, *args)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 1070, in _simple_command
return self._command_complete(name, self._command(name, *args))
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 897, in _command_complete
typ, data = self._get_tagged_response(tag)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 999, in _get_tagged_response
self._get_response()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 916, in _get_response
resp = self._get_line()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 1009, in _get_line
line = self.readline()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/imaplib.py", line 1171, in readline
return self.file.readline()
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/socket.py", line 445, in readline
data = self._sock.recv(self._rbufsize)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ssl.py", line 301, in recv
return self.read(buflen)
File "/base/data/home/runtimes/python27/python27_dist/lib/python2.7/ssl.py", line 220, in read
return self._sslobj.read(len)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/remote_socket/_remote_socket.py", line 864, in recv
return self.recvfrom(buffersize, flags)[0]
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/remote_socket/_remote_socket.py", line 903, in recvfrom
apiproxy_stub_map.MakeSyncCall('remote_socket', 'Receive', request, reply)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 94, in MakeSyncCall
return stubmap.MakeSyncCall(service, call, request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 328, in MakeSyncCall
rpc.CheckSuccess()
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/api/apiproxy_rpc.py", line 133, in CheckSuccess
raise self.exception
DeadlineExceededError: The API call remote_socket.Receive() took too long to respond and was cancelled.
Which kind of DeadlineExceededError?
There are three kinds of DeadlineExceededError in AppEngine.
https://developers.google.com/appengine/articles/deadlineexceedederrors
google.appengine.runtime.DeadlineExceededError: raised if the overall request times out, typically after 60 seconds, or 10 minutes
for task queue requests.
google.appengine.runtime.apiproxy_errors.DeadlineExceededError: raised if an RPC exceeded its deadline. This is typically 5 seconds,
but it is settable for some APIs using the 'deadline' option.
google.appengine.api.urlfetch_errors.DeadlineExceededError: raised if the URLFetch times out.
As you can see. The 10mins limit of taskqueue only help thegoogle.appengine.runtime.DeadlineExceededError. The type of DeadlineExceededError can be identified via traceback (list below). In this case, it is google.appengine.runtime.apiproxy_errors.DeadlineExceededError. Which will raise in 5secs by default. (I will update the post after figure out how to change it)
TYPE 1. google.appengine.runtime.DeadlineExceededError
The traceback looks like
Traceback (most recent call last):
File "/base/data/home/runtimes/python27/python27_lib/versions/1/google/appengine/runtime/wsgi.py", line 266, in Handle
result = handler(dict(self._environ), self._StartResponse)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1505, in __call__
rv = self.router.dispatch(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1253, in default_dispatcher
return route.handler_adapter(request, response)
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 1077, in __call__
return handler.dispatch()
File "/base/data/home/runtimes/python27/python27_lib/versions/third_party/webapp2-2.3/webapp2.py", line 545, in dispatch
return method(*args, **kwargs)
File "/base/data/home/apps/s~tagtooadex2/test.371204033771063679/index.py", line 9, in get
pass
DeadlineExceededError
Solution
This exception can be solved by using taskqueue (10mins), backend or manual scaling options.
https://developers.google.com/appengine/docs/python/modules/#Python_Instance_scaling_and_class
TYPE 2. google.appengine.runtime.apiproxy_errors.DeadlineExceededError
The traceback looks like
DeadlineExceededError: The API call remote_socket.Receive() took too long to respond and was cancelled.
TYPE 3. google.appengine.api.urlfetch_errors.DeadlineExceededError
The traceback looks like
DeadlineExceededError: Deadline exceeded while waiting for HTTP response from URL: http://www.sogi.com.tw/newforum/article_list.aspx?topic_ID=6089521
Solution:
This exception can be solved by extend the deadline.
urlfetch.fetch(url, deadline=10*60)
https://developers.google.com/appengine/docs/python/urlfetch/fetchfunction
There's no mentioning of timeout in imaplib sources. So there are several options:
IMAP uses socket library to establish the connection, consider using
socket.setdefaulttimeout(timeoutValue) but if you do so be
aware of side-effects.
The second option is to create your own IMAP4 class child with
tunable timeout, shall we say in open function
From the Google App Engine documentation, it seems like there are many
possible causes for DeadlineExceededError.
In your case, it seems that it may be one of the last two (out of three) types of DeadlineExceededError presented on the page.

Google Appengine URLFetch Timeouts - Any Best Practices?

New to python and appengine. Have got a little toy i've been playing with and ran into some script timeouts last night. I know you're capped at 10 seconds. Whats best practice for dealing with this?
edit
Sorry, should have been more clear. the URLFetch Timeout is the issue I am having. By Default it is set to 5 seconds, max is 10
Traceback (most recent call last):
File "/base/python_runtime/python_lib/versions/1/google/appengine/ext/webapp/__init__.py", line 636, in __call__
handler.post(*groups)
File "/base/data/home/apps/netlicense/3.349495357411133950/main.py", line 235, in post
graph.put_wall_post(message=body, attachment=attch, profile_id=self.request.get("fbid"))
File "/base/data/home/apps/netlicense/3.349495357411133950/facebook.py", line 149, in put_wall_post
return self.put_object(profile_id, "feed", message=message, **attachment)
File "/base/data/home/apps/netlicense/3.349495357411133950/facebook.py", line 131, in put_object
return self.request(parent_object + "/" + connection_name, post_args=data)
File "/base/data/home/apps/netlicense/3.349495357411133950/facebook.py", line 179, in request
file = urllib2.urlopen(urlpath, post_data)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 124, in urlopen
return _opener.open(url, data)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 381, in open
response = self._open(req, data)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 399, in _open
'_open', req)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 360, in _call_chain
result = func(*args)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 1115, in https_open
return self.do_open(httplib.HTTPSConnection, req)
File "/base/python_runtime/python_dist/lib/python2.5/urllib2.py", line 1080, in do_open
r = h.getresponse()
File "/base/python_runtime/python_dist/lib/python2.5/httplib.py", line 197, in getresponse
self._allow_truncated, self._follow_redirects)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 260, in fetch
return rpc.get_result()
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/apiproxy_stub_map.py", line 592, in get_result
return self.__get_result_hook(self)
File "/base/python_runtime/python_lib/versions/1/google/appengine/api/urlfetch.py", line 361, in _get_fetch_result
raise DeadlineExceededError(str(err))
DeadlineExceededError: ApplicationError: 5
You have not told us what your application does, so here are some generic suggestions:
You can trap the timeout exception with this exception class google.appengine.api.urlfetch.DownloadError and gently alert the users to retry.
Web request run time is 30 seconds max; if what you are trying to download is relatively small, you could probably trap the exception and resubmit (for just one time) the urlfetch inside the same Web request.
If working offline is not a problem for your app, you can move the Urlfetch call to a worker task served by a Task Queue; one of the advantage of using the taskqueue API is that App Engine automatically retries the Urlfetch task until it succeeds.

Categories