problems downloading large files with requests? - python

I'm trying to download a video file using an API, the equivalent curl command works without problem, the python code below works without error for small videos:
with requests.get("http://username:password#url/Download/", data=data, stream=True) as r:
r.raise_for_status()
with open("deliverables/video_output34.mp4", "wb") as f:
for chunk in r.iter_content(chunk_size=1024):
f.write(chunk)
it fails for large videos (failed for video ~34M) (the equivalent curl command works for this one)
Traceback (most recent call last):
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 479, in send
r = low_conn.getresponse(buffering=True)
TypeError: getresponse() got an unexpected keyword argument 'buffering'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 482, in send
r = low_conn.getresponse()
File "/usr/local/lib/python3.7/http/client.py", line 1321, in getresponse
response.begin()
File "/usr/local/lib/python3.7/http/client.py", line 296, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.7/http/client.py", line 265, in _read_status
raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/nabil/.local/lib/python3.7/site-packages/requests/api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/home/nabil/.local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: Remote end closed connection without response
I've checked links like the following without success

Thanks to SilentGhost on IRC#python who pointed out to this suggesting I should upgrade my requests, which solved it(from 2.22.0 to 2.24.0).
upgrading the package is done like this:
pip install requests --upgrade
Another source that may help someone looking at this question is to use pycurl, here is a good starting point: https://github.com/rajatkhanduja/PyCurl-Downloader
or/and you can use --libcurl to your curl command to get a good indication on how to use pycurl

Related

Receiving EPIPE error when streaming from PSQL copy function

I am trying to write a streaming implementation of dumping a table from psql into a pre-signed URL on S3. Unfortunately, it seems to error out at a seemingly random time in the upload. I have tried many combinations of opening/closing the file descriptors at different times. I for the life of me cannot figure out why this is occurring.
The strangest thing is when I mock the requests library and analyze the sent data, it works as intended. The socket is raising an EPIPE error at a certain amount through the stream
from psycopg2 import connect
import threading
import requests
import requests_mock
import traceback
from base64 import b64decode
from boto3 import session
r_fd, w_fd = os.pipe()
connection = connect(host='host', database='db',
user='user', password='pw')
cursor = connection.cursor()
b3_session = session.Session(profile_name='profile', region_name='us-east-1')
url = b3_session.client('s3').generate_presigned_url(
ClientMethod='put_object',
Params={'Bucket': 'bucket', 'Key': 'test_streaming_upload.txt'},
ExpiresIn=3600)
rd = os.fdopen(r_fd, 'rb')
wd = os.fdopen(w_fd, 'wb')
def stream_data():
print('Starting stream')
with os.fdopen(r_fd, 'rb') as rd:
requests.put(url, data=rd, headers={'Content-type': 'application/octet-stream'})
print('Ending stream')
to_thread = threading.Thread(target=stream_data)
to_thread.start()
print('Starting copy')
with os.fdopen(w_fd, 'wb') as wd:
cursor.copy_expert('COPY table TO STDOUT WITH CSV HEADER', wd)
print('Ending copy')
to_thread.join()
The output is always the same:
Starting stream
Starting copy
Exception in thread Thread-1:
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py", line 342, in _send_until_done
return self.connection.send(data)
File "/venv/lib/python3.9/site-packages/OpenSSL/SSL.py", line 1718, in send
self._raise_ssl_error(self._ssl, result)
File "/venv/lib/python3.9/site-packages/OpenSSL/SSL.py", line 1624, in _raise_ssl_error
raise SysCallError(errno, errorcode.get(errno))
OpenSSL.SSL.SysCallError: (32, 'EPIPE')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/venv/lib/python3.9/site-packages/requests/adapters.py", line 473, in send
low_conn.send(b'\r\n')
File "/Users/me/.pyenv/versions/3.9.7/lib/python3.9/http/client.py", line 995, in send
self.sock.sendall(data)
File "/venv/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py", line 354, in sendall
sent = self._send_until_done(
File "/venv/lib/python3.9/site-packages/urllib3/contrib/pyopenssl.py", line 349, in _send_until_done
raise SocketError(str(e))
OSError: (32, 'EPIPE')
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/me/.pyenv/versions/3.9.7/lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File "/Users/me/.pyenv/versions/3.9.7/lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "/Users/me/Library/Application Support/JetBrains/PyCharm2021.2/scratches/scratch_60.py", line 37, in stream_data
requests.put(url, data=rd, headers={'Content-type': 'application/octet-stream'})
File "/venv/lib/python3.9/site-packages/requests/api.py", line 131, in put
return request('put', url, data=data, **kwargs)
File "/venv/lib/python3.9/site-packages/requests/api.py", line 60, in request
return session.request(method=method, url=url, **kwargs)
File "/venv/lib/python3.9/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/venv/lib/python3.9/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/venv/lib/python3.9/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: (32, 'EPIPE')
Am I missing something obvious? Is this a memory error? I appreciate any insight I can get because this is killing me. I can verify that the socket is being written to anywhere from 1.5 to 2.5k times before this error occurs.

Python Azure SDK - Unable to download zip file

I am using the latest version of Azure Storgae SDK on Python 3.5.2.
I want to download a zip file from a blob on Azure storage cloud.
My Code:
self.azure_service= BlockBlobService(account_name = ACCOUNT_NAME,
account_key = KEY)
with open(local_path, "wb+") as f:
self.azure_service.get_blob_to_stream(blob_container,
file_cloud_path,
f)
The Error:
AzureException: ('Received response with content-encoding: gzip, but failed to decode it.,, error('Error -3 while decompressing data: incorrect header check',))
The error is probably coming from the requests package and i don't seem to have access for changing the headers or something like that.
What exactly is the problem and how can i fix it?
Just as summary,I tried to verify the above exception with Microsoft Azure Storage Explorer Tool.
When user upload a zip type file , if set the EncodingType property for gzip.
at the time of download the client will check whether the file type can be to depressed to EncodingType , if dismatch will occur the exception as below:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\urllib3\response.py", line 266, in _decode
data = self._decoder.decompress(data)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 66, in decompress
return self._obj.decompress(data)
zlib.error: Error -3 while decompressing data: incorrect header check
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\requests\models.py", line 745, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "D:\Python35\lib\site-packages\urllib3\response.py", line 436, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 408, in read
data = self._decode(data, decode_content, flush_decoder)
File "D:\Python35\lib\site-packages\urllib3\response.py", line 271, in _decode
"failed to decode it." % content_encoding, e)
urllib3.exceptions.DecodeError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 222, in _perform_request
response = self._httpclient.perform_request(request)
File "D:\Python35\lib\site-packages\azure\storage\_http\httpclient.py", line 114, in perform_request
proxies=self.proxies)
File "D:\Python35\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "D:\Python35\lib\site-packages\requests\sessions.py", line 658, in send
r.content
File "D:\Python35\lib\site-packages\requests\models.py", line 823, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "D:\Python35\lib\site-packages\requests\models.py", line 750, in generate
raise ContentDecodingError(e)
requests.exceptions.ContentDecodingError: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:/PythonWorkSpace/AzureStorage/BlobStorage/CreateContainer.py", line 20, in <module>
f)
File "D:\Python35\lib\site-packages\azure\storage\blob\baseblobservice.py", line 1932, in get_blob_to_stream
_context=operation_context)
File "D:\Python35\lib\site-packages\azure\storage\blob\baseblobservice.py", line 1659, in _get_blob
operation_context=_context)
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 280, in _perform_request
raise ex
File "D:\Python35\lib\site-packages\azure\storage\storageclient.py", line 252, in _perform_request
raise AzureException(ex.args[0])
azure.common.AzureException: ('Received response with content-encoding: gzip, but failed to decode it.', error('Error -3 while decompressing data: incorrect header check',))
Process finished with exit code 1
Solution:
As #Gaurav Mantri sail, you could set the EncodingType property to None or ensure that the EncodingType setting matches the type of the file itself.
Also,you could refer to the SO thread python making POST request with JSON data.

Python Geocoders Not Working

I am trying to access geolocation of addresses input by the User in my Django App.
The App was working fine with pygeocoder. Suddenly, the app has started giving problems.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pygeocoder.py", line 129, in geocode
return GeocoderResult(Geocoder.get_data(params=params))
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/pygeocoder.py", line 204, in get_data
response = session.send(request.prepare())
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/requests/sessions.py", line 576, in send
r = adapter.send(request, **kwargs)
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/requests/adapters.py", line 370, in send
timeout=timeout
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 544, in urlopen
body=body, headers=headers)
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 344, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
File "/home/ishaan/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/requests/packages/urllib3/connectionpool.py", line 314, in _raise_timeout
if 'timed out' in str(err) or 'did not complete (read)' in str(err): # Python 2.6
TypeError: __str__ returned non-string (type Error)
I am unable to unserstand this problem, and have a close to no knowledge about SSL and its errors.
I dont know, how relevant it is, but the app was working until I installed Redis Server for another app in the same project. I have uninstalled it, but it is still not working.
Lastly, No other Geocoder is working - Every geocoder gives the same error. I tried to use GeoPy, geocoder 1.6.4
Thanks in advance

Getting error when using praw to login to reddit

I've seen a couple of questions that already asked this but there were no responses, so I'll give it a try. When I use the following code:
import praw, time
r = praw.Reddit(user_agent="Bot experiment by redacted")
r.login('redacted', 'redacted')
I get a connection error that has the following traceback:
Traceback (most recent call last):
File "redacted", line 5, in <module>
r.login('redacted', 'redacted')
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/praw/__init__.py", line 1263, in login
self.request_json(self.config['login'], data=data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/praw/decorators.py", line 161, in wrapped
return_value = function(reddit_session, *args, **kwargs)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/praw/__init__.py", line 519, in request_json
response = self._request(url, params, data)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/praw/__init__.py", line 383, in _request
_raise_response_exceptions(response)
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/praw/internal.py", line 172, in _raise_response_exceptions
response.raise_for_status()
File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/requests/models.py", line 831, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 403 Client Error: Forbidden
I have tried this from python 3.4 and 2.7, I've tried running from IDLE and from the terminal. I've tried leaving my username and password out and logging in when prompted. I've tried from my Mac in my hotel room and a Windows machine from work and I get the same error everytime. I've tried from my bot account that I just made and my normal account. Does anyone have any ideas?
The issue was that I had the word 'bot' in my user_agent string. After it was removed, there were no problems.

How to catch exception for which the name is not defined in context?

I am seeing the python-requests library crash with the following traceback:
Traceback (most recent call last):
File "/usr/lib/python3.2/http/client.py", line 529, in _read_chunked
chunk_left = int(line, 16)
ValueError: invalid literal for int() with base 16: b''
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "./app.py", line 507, in getUrlContents
response = requests.get(url, headers=headers, auth=authCredentials, timeout=http_timeout_seconds)
File "/home/dotancohen/code/lib/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/home/dotancohen/code/lib/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/home/dotancohen/code/lib/requests/sessions.py", line 338, in request
resp = self.send(prep, **send_kwargs)
File "/home/dotancohen/code/lib/requests/sessions.py", line 441, in send
r = adapter.send(request, **kwargs)
File "/home/dotancohen/code/lib/requests/adapters.py", line 340, in send
r.content
File "/home/dotancohen/code/lib/requests/models.py", line 601, in content
self._content = bytes().join(self.iter_content(CONTENT_CHUNK_SIZE)) or bytes()
File "/home/dotancohen/code/lib/requests/models.py", line 542, in generate
for chunk in self.raw.stream(chunk_size, decode_content=True):
File "/home/dotancohen/code/lib/requests/packages/urllib3/response.py", line 222, in stream
data = self.read(amt=amt, decode_content=decode_content)
File "/home/dotancohen/code/lib/requests/packages/urllib3/response.py", line 173, in read
data = self._fp.read(amt)
File "/usr/lib/python3.2/http/client.py", line 489, in read
return self._read_chunked(amt)
File "/usr/lib/python3.2/http/client.py", line 534, in _read_chunked
raise IncompleteRead(b''.join(value))
http.client.IncompleteRead: IncompleteRead(0 bytes read)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.2/threading.py", line 740, in _bootstrap_inner
self.run()
File "./app.py", line 298, in run
self.target(*self.args)
File "./app.py", line 400, in provider_query
url_contents = getUrlContents(str(providerUrl), '', authCredentials)
File "./app.py", line 523, in getUrlContents
except http.client.IncompleteRead as error:
NameError: global name 'http' is not defined
As can be seen, I've tried to catch the http.client.IncompleteRead: IncompleteRead(0 bytes read) error that requests is throwing with the line except http.client.IncompleteRead as error:. However, that is throwing a NameError due to http not being defined. So how can I catch that exception?
This is the code throwing the exception:
import requests
from requests_oauthlib import OAuth1
authCredentials = OAuth1('x', 'x', 'x', 'x')
response = requests.get(url, auth=authCredentials, timeout=20)
Note that I am not including the http library, though requests is including it. The error is very intermittent (happens perhaps once every few hours, even if I run the requests.get() command every ten seconds) so I'm not sure if added the http library to the imports has helped or not.
In any case, in the general sense, if included library A in turn includes library B, is it impossible to catch exceptions from B without including B myself?
To answer your question
In any case, in the general sense, if included library A in turn includes library B, is it impossible to catch exceptions from B without including B myself?
Yes. For example:
a.py:
import b
# do some stuff with b
c.py:
import a
# but you want to use b
a.b # gives you full access to module b which was imported by a
Although this does the job, it doesn't look so pretty, especially with long package/module/class/function names in real world.
So in your case to handle http exception, either try to figure out which package/module within requests imports http and so that you'd do raise requests.XX.http.WhateverError or rather just import it as http is a standard library.
It's hard to analyze the problem if you don't give source and just the stout,
but check this link out : http://docs.python-requests.org/en/latest/user/quickstart/#errors-and-exceptions
Basically,
try and catch the exception whereever the error is rising in your code.
Exceptions:
In the event of a network problem (e.g. DNS failure, refused connection, etc),
Requests will raise a **ConnectionError** exception.
In the event of the rare invalid HTTP response,
Requests will raise an **HTTPError** exception.
If a request times out, a **Timeout** exception is raised.
If a request exceeds the configured number of maximum redirections,
a **TooManyRedirects** exception is raised.
All exceptions that Requests explicitly raises inherit
from **requests.exceptions.RequestException.**
Hope that helped.

Categories