I asked a question about how to throttle a python upload, which sent me to this answer, where I was informed of a little helper library called socket-throttle. That's all fine and dandy for regular HTTP and probably also for most plain uses of the socket. However, I'm trying to throttle an SSL connection, and trying to combine socket-throttle with the stock SSL library (used implicitly by requests) causes an exception deep in the guts of the library:
File "***.py", line 590, in request
r = self.session.get(url, headers=extra_headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 394, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 324, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 478, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 285, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py", line 95, in connect
ssl_version=resolved_ssl_version)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util.py", line 643, in ssl_wrap_socket
ssl_version=ssl_version)
File "/usr/lib/python2.7/ssl.py", line 487, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 211, in __init__
socket.__init__(self, _sock=sock._sock)
File "***/socket_throttle.py", line 54, in __getattr__
return getattr(self._wrappedsock, attr)
AttributeError: '_socket.socket' object has no attribute '_sock'
Well, that's a downer. As you can tell, the ssl package is trying to use one of the socket's private fields, _sock rather than the socket itself. (Isn't the point of private fields that you're not supposed to access them from the outside? Grr.) If I try to inject myself into that field on my ThrottledSocket object, I run into this problem:
File "/home/alex/dev/jottalib/src/jottalib/JFS.py", line 590, in request
r = self.session.get(url, headers=extra_headers)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 394, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 382, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 485, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 324, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 478, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connectionpool.py", line 285, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/connection.py", line 95, in connect
ssl_version=resolved_ssl_version)
File "/usr/local/lib/python2.7/dist-packages/requests/packages/urllib3/util.py", line 643, in ssl_wrap_socket
ssl_version=ssl_version)
File "/usr/lib/python2.7/ssl.py", line 487, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 241, in __init__
ciphers)
TypeError: must be _socket.socket, not ThrottledSocket
Now what? Is there somewhere else in this where I could rate-limit the python communication? Or is there a cleaner way to do it than having to override the socket implementation? Which turns out to be moot anyway, since the ssl package just tries to bypass it altogether.
Depending on your requirements, you can and maybe should solve this particular problem on the OS level instead of on the application level.
Approaching this on the OS level has two advantages. First, it does not make a difference how the sockets involved are used (HTTP or HTTPS or IRC or some ping of death packets -- it does not matter). Secondly, the more you decouple the different components of your system, the easier it is to make changes afterwards and to debug issues.
There are tools (at least for POSIX-compliant systems) for throttling bandwidth of network interfaces and/or processes. You might want to have a look at these, for example:
trickle (for shaping traffic of processes)
wondershaper (for shaping traffic of entire network interfaces, I have actually used this from within a modern Ubuntu, and it works perfectly fine)
These discussions might be relevant for you:
https://superuser.com/questions/66574/how-to-throttle-bandwidth-on-a-linux-network-interface
http://jwalanta.blogspot.de/2009/04/easy-bandwidth-shaping-in-linux.html
https://unix.stackexchange.com/questions/28198/how-to-limit-network-bandwidth
It looks like you're trying to throttle HTTP requests. If that's the case, you can try RequestsThrottler instead. Python requests is way nicer than httplib too.
Related
Trace below.
The relevant Python snippet:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
Which ultimately triggers (from the ssl library):
OverflowError: string longer than 2147483647 bytes
I assume there is some special configuration option I'm missing?
This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.
Help appreciated!
Full trace:
[File "/usr/src/app/gcloud/download_data.py", line 109, in *******
blob.upload_from_filename(source_path)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename
size=total_bytes)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file
client, file_obj, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload
client, stream, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload
transport, data, object_metadata, content_type)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
retry_strategy=self._retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
response = func()
File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 891, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 861, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write
return self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.
Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
Happy Hacking!
I am trying to run a very simple insertion to Elasticsearch in Python:
es = Elasticsearch({'host': 'localhost', 'port': 9200})
res = es.index(index='data-client_dev', doc_type='test', id=2, body={'author': 'Christophe'}, timeout=60)
print(res['created'])
But I keep having the error pasted at the end.
I am under Ubuntu 14 and am using PyCharm (if this might help). The ES node is up and running locally on my computer.
I tried to change the timeout (or with request_timeout) but it is doing nothing. What is weird is the query is working from the terminal, so maybe it is coming for Pycharm.
I am a beginner so maybe I missed something obvious.
Thanks a lot for your help!
WARNING:elasticsearch:PUT http://port:9200/data-client_dev/test/2?timeout=60 [status:N/A request:20.040s]
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/elasticsearch/connection/http_urllib3.py", line 78, in perform_request
response = self.pool.urlopen(method, url, body, retries=False, headers=self.headers, **kw)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 608, in urlopen
_stacktrace=sys.exc_info()[2])
File "/usr/local/lib/python2.7/dist-packages/urllib3/util/retry.py", line 224, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 558, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 353, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 979, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1013, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/local/lib/python2.7/dist-packages/urllib3/connection.py", line 162, in connect
conn = self._new_conn()
File "/usr/local/lib/python2.7/dist-packages/urllib3/connection.py", line 142, in _new_conn
(self.host, self.timeout))
ConnectTimeoutError: (<urllib3.connection.HTTPConnection object at 0x7fe723d7e550>, u'Connection to port timed out. (connect timeout=10)')
WARNING:elasticsearch:Connection <Urllib3HttpConnection: http://port:9200> has failed for 1 times in a row, putting on 60 second timeout.
You need to create your Elasticsearch client like this, i.e. by putting the host in a list:
es = Elasticsearch([{'host': 'localhost', 'port': 9200}])
^ ^
| |
add this and this
This request never returns (or at least not within my patience):
import requests
r = requests.get('http://en.wikipedia.org/w/api.php?rcprop=ids&format=json&action=query&rclimit=10&rctype=edit&list=recentchanges&rcnamespace=0', headers={'user-agent': 'api test'})
Hitting Ctrl+C always produces this traceback:
^CTraceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 542, in urlopen
body=body, headers=headers)
File "/usr/lib/python2.7/dist-packages/urllib3/connectionpool.py", line 367, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 973, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1007, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 969, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 829, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 791, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 772, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 562, in create_connection
sock.connect(sa)
File "/usr/lib/python2.7/socket.py", line 224, in meth
return getattr(self._sock,name)(*args)
Adding timeout=5 to the request causes the request to succeed, after the timeout has expired (ie the correct data is returned from the API request). But of course that adds five seconds of latency into my application for every API request.
What's going wrong here?
This was due to IPv6 not working very well on my network. httplib (and therefore Requests) seems to prefer IPv6 if it's available, but if it's not working very well then you can have a long wait while the IPv6 request times out. Setting a timeout causes it to fall back to IPv4 following the expiry of the timeout, which then succeeds. Disabling IPv6 on my network has fixed this (as, I assume, would fixing IPv6).
I am playing around with GMAIL API to create an app to send auto emails from my server. Running my application as a simple user has the following results:
pankgeorg#snf-25181:~/tomotech/gmailer$ python mailer.py
Traceback (most recent call last):
File "mailer.py", line 36, in <module>
gmail_service = build('gmail', 'v1', http=http)
File "/usr/local/lib/python2.7/dist-packages/oauth2client-1.4.5-py2.7.egg/oauth2client/util.py", line 135, in positional_wrapper
File "/usr/local/lib/python2.7/dist-packages/google_api_python_client-1.3.1-py2.7.egg/googleapiclient/discovery.py", line 198, in build
File "/usr/local/lib/python2.7/dist-packages/oauth2client-1.4.5-py2.7.egg/oauth2client/util.py", line 135, in positional_wrapper
File "/usr/local/lib/python2.7/dist-packages/oauth2client-1.4.5-py2.7.egg/oauth2client/client.py", line 547, in new_request
File "/usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg/httplib2/__init__.py", line 1593, in request
(response, content) = self._request(conn, authority, uri, request_uri, method, body, headers, redirections, cachekey)
File "/usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg/httplib2/__init__.py", line 1335, in _request
(response, content) = self._conn_request(conn, request_uri, method, body, headers)
File "/usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg/httplib2/__init__.py", line 1257, in _conn_request
conn.connect()
File "/usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg/httplib2/__init__.py", line 1021, in connect
self.disable_ssl_certificate_validation, self.ca_certs)
File "/usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg/httplib2/__init__.py", line 80, in _ssl_wrap_socket
cert_reqs=cert_reqs, ca_certs=ca_certs)
File "/usr/lib/python2.7/ssl.py", line 886, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 496, in __init__
self._context.load_verify_locations(ca_certs)
IOError: [Errno 13] Permission denied
On the other hand, running with sudo works perfectly
pankgeorg#snf-25181:~/tomotech/gmailer$ sudo python mailer.py
Message Id: 14ad0aea05e*****
To be completely honest, in order to authenicate using --noauth_local_webserver, I run the command with sudo, authenticated and chown the gmail.storage to myself again.
Also, I installed using easy_install because pip install was giving me the following error:
pankgeorg#snf-25181:~/tomotech/gmailer$ sudo pip install --upgrade google_api_python_client
Cleaning up...
Exception:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pip/basecommand.py", line 122, in main
status = self.run(options, args)
File "/usr/local/lib/python2.7/dist-packages/pip/commands/install.py", line 278, in run
requirement_set.prepare_files(finder, force_root_egg_info=self.bundle, bundle=self.bundle)
File "/usr/local/lib/python2.7/dist-packages/pip/req.py", line 1096, in prepare_files
req_to_install, self.upgrade)
File "/usr/local/lib/python2.7/dist-packages/pip/index.py", line 194, in find_requirement
page = self._get_page(main_index_url, req)
File "/usr/local/lib/python2.7/dist-packages/pip/index.py", line 568, in _get_page
session=self.session,
File "/usr/local/lib/python2.7/dist-packages/pip/index.py", line 670, in get_page
resp = session.get(url, headers={"Accept": "text/html"})
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 395, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pip/download.py", line 237, in request
return super(PipSession, self).request(method, url, *args, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 506, in send
history = [resp for resp in gen] if allow_redirects else []
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 168, in resolve_redirects
allow_redirects=False,
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/adapters.py", line 330, in send
timeout=timeout
File "/usr/local/lib/python2.7/dist- packages/pip/_vendor/requests/packages/urllib3/connectionpool.py", line 480, in urlopen
body=body, headers=headers)
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/connectionpool.py", line 285, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python2.7/httplib.py", line 1001, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 1035, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 997, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 850, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 826, in send
self.sock.sendall(data)
File "/usr/local/lib/python2.7/dist-packages/pip/_vendor/requests/packages/urllib3/contrib/pyopenssl.py", line 323, in sendall
return self.connection.sendall(data)
File "/usr/lib/python2.7/dist-packages/OpenSSL/SSL.py", line 969, in sendall
raise TypeError("buf must be a byte string")
TypeError: buf must be a byte string
Storing debug log for failure in /root/.pip/pip.logrom oath2client import tool
It is my understanding that the root of the problem is the same in both cases.
I also have to note that my laptop (on which I can authenticate normally, on webbrowser, instead of --noauth_local_webserver) it works just fine, even though the installation is done the same way (the problem with pip appears there too).
Thanks in advance and sorry for the long post!
Tutorials I used:
parse arg
code for sending mails
application body is pretty much the quickstart for gmail api.
The httplib2 installer sets incorrect permissions for its httplib2/cacerts.txt file. One solution is to simply make its files readable by anyone by running
chmod o+r -R /usr/local/lib/python2.7/dist-packages/httplib2-0.9-py2.7.egg
However, it may be better to uninstall the version installed by pip, and use your operating system's package manager instead, which probably has the correct permissions for all files. On Debian, this could be accomplished with
pip uninstall httplib2
apt-get install python-httplib2
Tried the 3 following methods to control Tor:
using TorCtl/urllib2: Python script Exception with Tor
using socks/httplib: http://www.youtube.com/watch?v=KDsmVH7eJCs
using socks/urllib2: Python urllib over TOR?
Each of them fails w/ same error (tried to make it as clear as possible):
Traceback (most recent call last):
File "tor.py", line 26, in <module>
print(urllib2.urlopen("http://www.ifconfig.me/ip").read())
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/urllib2.py"
line 127, in urlopen
return _opener.open(url, data, timeout)
line 404, in open
response = self._open(req, data)
line 422, in _open
'_open', req)
line 382, in _call_chain
result = func(*args)
line 1214, in http_open
return self.do_open(httplib.HTTPConnection, req)
line 1181, in do_open
h.request(req.get_method(), req.get_selector(), req.data, headers)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/httplib.py"
line 973, in request
self._send_request(method, url, body, headers)
line 1007, in _send_request
self.endheaders(body)
line 969, in endheaders
self._send_output(message_body)
line 829, in _send_output
self.send(msg)
line 791, in send
self.connect()
line 772, in connect
self.timeout, self.source_address)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/socket.py"
line 562, in create_connection
sock.connect(sa)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SocksiPy_branch-1.01-py2.7.egg/socks.py"
line 392, in connect
self.__negotiatesocks5(destpair[0],destpair[1])
line 199, in __negotiatesocks5
self.sendall("\x05\x01\x00")
line 165, in sendall
socket.socket.sendall(self, bytes)
... last error repeating a lot of times and then ...
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/SocksiPy_branch-1.01-py2.7.egg/socks.py", line 163, in sendall
if 'encode' in dir(bytes):
RuntimeError: maximum recursion depth exceeded while calling a Python object
Does anyone understand where it comes from?
For me it actually failed with socksipy-branch installed with pip.
However, it worked ok after I downloaded socks.py directly from http://socksipy.sourceforge.net/ to my working directory.
It looks like you are using the SocksiPy library. I had the same problem and got it fixed by installing this library again directly from https://code.google.com/p/socksipy-branch/. The first time I installed it via pip.