How can I stream big data to Google Cloud Storage? - python

I am working on a system for analyzing data of any size and format streamed by the users to my private cloud based on Google Cloud Storage. Do you have any ideas how can I allow them to stream big data? At the moment I use Django API and I do this in this way:
def upload_blob(source_file_name, destination_blob_name):
blob = bucket.blob(destination_blob_name)
blob.upload_from_filename(source_file_name)
print('File {} uploaded to {}.'.format(
source_file_name,
destination_blob_name))
It works correctly with small files however when I send for example large movie I get the error shown below. I am aware that this is not the optimal solution but I have no idea how can I solve this. As you can notice at the moment they send me requests with the blob format but with very large files it does not work. Do you have any ideas how can I solve my problem and send users data of any size to Google Cloud Storage?
Internal Server Error: /cloud/
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1065, in _send_output
self.send(chunk)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 987, in send
self.sock.sendall(data)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1034, in sendall
v = self.send(byte_view[count:])
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1003, in send
return self._sslobj.write(data)
socket.timeout: The write operation timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 449, in send
timeout=timeout
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 720, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "/usr/local/lib/python3.7/site-packages/urllib3/util/retry.py", line 400, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/usr/local/lib/python3.7/site-packages/urllib3/packages/six.py", line 734, in reraise
raise value.with_traceback(tb)
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 672, in urlopen
chunked=chunked,
File "/usr/local/lib/python3.7/site-packages/urllib3/connectionpool.py", line 387, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1252, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1298, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1247, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 1065, in _send_output
self.send(chunk)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/http/client.py", line 987, in send
self.sock.sendall(data)
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1034, in sendall
v = self.send(byte_view[count:])
File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/ssl.py", line 1003, in send
return self._sslobj.write(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', timeout('The write operation timed out'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/exception.py", line 34, in inner
response = get_response(request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 115, in _get_response
response = self.process_exception_by_middleware(e, request)
File "/usr/local/lib/python3.7/site-packages/django/core/handlers/base.py", line 113, in _get_response
response = wrapped_callback(request, *callback_args, **callback_kwargs)
File "/usr/local/lib/python3.7/site-packages/django/views/decorators/csrf.py", line 54, in wrapped_view
return view_func(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/django/views/generic/base.py", line 71, in view
return self.dispatch(request, *args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 505, in dispatch
response = self.handle_exception(exc)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 465, in handle_exception
self.raise_uncaught_exception(exc)
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 476, in raise_uncaught_exception
raise exc
File "/usr/local/lib/python3.7/site-packages/rest_framework/views.py", line 502, in dispatch
response = handler(request, *args, **kwargs)
File "/mypath/backend/views.py", line 635, in post
'user/' + str(user_name) + '/' + str(file))
File "/mypath/backend/views.py", line 214, in upload_blob
blob.upload_from_filename(source_file_name)
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1318, in upload_from_filename
predefined_acl=predefined_acl,
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1263, in upload_from_file
client, file_obj, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1173, in _do_upload
client, stream, content_type, size, num_retries, predefined_acl
File "/usr/local/lib/python3.7/site-packages/google/cloud/storage/blob.py", line 1120, in _do_resumable_upload
response = upload.transmit_next_chunk(transport)
File "/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/upload.py", line 425, in transmit_next_chunk
retry_strategy=self._retry_strategy,
File "/usr/local/lib/python3.7/site-packages/google/resumable_media/requests/_helpers.py", line 136, in http_request
return _helpers.wait_and_retry(func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python3.7/site-packages/google/resumable_media/_helpers.py", line 150, in wait_and_retry
response = func()
File "/usr/local/lib/python3.7/site-packages/google/auth/transport/requests.py", line 216, in request
method, url, data=data, headers=request_headers, **kwargs
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 533, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/sessions.py", line 646, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.7/site-packages/requests/adapters.py", line 498, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', timeout('The write operation timed out'))
[24/Mar/2020 19:17:26] "POST /cloud/ HTTP/1.1" 500 20879

Have a look at Resumable uploads.
This option provides a resumable data transfer feature that lets you resume upload operations after a communication failure has interrupted the flow of data.
Especially useful if you are transferring large files, because the likelihood of a network interruption or some other transmission failures is high. In case of a failure, you do not have to restart large file uploads from the beginning when using this option.

Related

How to upload large file to flask server with POST method?

I want to upload some files to the flask server through requests module of python.It works well when the file is small,but errors occur when the file is large.
The flask serve code:
from flask import Flask
app = Flask(__name__)
#app.route('/test',methods=['get', 'post'])
def upload():
return "Success"
the request code:
url = "http://localhost:5000/test"
response = requests.post(url, files={'file': open('1.mhd', 'rb') })
print(response.text)
When the file size is small,the response is Success,but when the file is large(about 100M),the error is:
Traceback (most recent call last):
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connectionpool.py", line 710, in urlopen
chunked=chunked,
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1075, in _send_output
self.send(chunk)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 997, in send
self.sock.sendall(data)
ConnectionAbortedError: [WinError 10053] 你的主机中的软件中止了一个已建立的连接。
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\software\anaconda\envs\torch13\lib\site-packages\requests\adapters.py", line 499, in send
timeout=timeout,
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connectionpool.py", line 788, in urlopen
method, url, error=e, _pool=self, _stacktrace=sys.exc_info()[2]
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\util\retry.py", line 550, in increment
raise six.reraise(type(error), error, _stacktrace)
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\packages\six.py", line 769, in reraise
raise value.with_traceback(tb)
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connectionpool.py", line 710, in urlopen
chunked=chunked,
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connectionpool.py", line 398, in _make_request
conn.request(method, url, **httplib_request_kw)
File "D:\software\anaconda\envs\torch13\lib\site-packages\urllib3\connection.py", line 239, in request
super(HTTPConnection, self).request(method, url, body=body, headers=headers)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1281, in request
self._send_request(method, url, body, headers, encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1327, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1276, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 1075, in _send_output
self.send(chunk)
File "D:\software\anaconda\envs\torch13\lib\http\client.py", line 997, in send
self.sock.sendall(data)
urllib3.exceptions.ProtocolError: ('Connection aborted.', ConnectionAbortedError(10053, '你的主机中的软件中止了一个已建立的连接。', None, 10053, None))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "E:/py_code/flask/test.py", line 53, in <module>
resp = session.post(url, headers=headers, data=form)
File "D:\software\anaconda\envs\torch13\lib\site-packages\requests\sessions.py", line 635, in post
return self.request("POST", url, data=data, json=json, **kwargs)
File "D:\software\anaconda\envs\torch13\lib\site-packages\requests\sessions.py", line 587, in request
resp = self.send(prep, **send_kwargs)
File "D:\software\anaconda\envs\torch13\lib\site-packages\requests\sessions.py", line 701, in send
r = adapter.send(request, **kwargs)
File "D:\software\anaconda\envs\torch13\lib\site-packages\requests\adapters.py", line 547, in send
raise ConnectionError(err, request=request)
requests.exceptions.ConnectionError: ('Connection aborted.', ConnectionAbortedError(10053, '你的主机中的软件中止了一个已建立的连接。', None, 10053, None))
Process finished with exit code 1

Web Scraping in HTML to using py-script

<body class="white-vertion black-bg">
<!-- Start Loader -->
<p>
<py-script>
import ssl
from urllib.request import urlopen
from bs4 import BeautifulSoup
context = ssl._create_unverified_context()
result = urlopen("https://blog.naver.com/PostList.naver?blogId=woong3164&categoryNo=0&from=postList", context=context)
bsObj = BeautifulSoup(result.read(), "html.parser")
</py-script>
</p>
I used py-script in HTML to do web scraping. However, this error occurred.
'JsException(PythonError: Traceback (most recent call last):
File "/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/lib/python3.10/http/client.py", line 1447, in connect
super().connect()
File "/lib/python3.10/http/client.py", line 941, in connect
self.sock = self._create_connection(
File "/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
BlockingIOError: [Errno 26] Operation in progress
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 429, in eval_code
.run(globals, locals)
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 300, in run
coroutine = eval(self.code, globals, locals)
File "", line 6, in
File "/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err) urllib.error.URLError: )'
I think this error was caused by ssl.
How can I solve this error?
Your problem is caused by using unsupported Python packages. The package urllib uses APIs (TCP Sockets) that do not exist in the browser. This is not a limitation of PyScript, no browser application can use socket-based APIs.
The solution is to use supported APIs such as fetch or pyfetch.

Slack unable to connect to send an message

I am using python slack module to send a message to a slack channel and i have also installed all the required modules (slack, slackClient, openssl) as well but I am facing with SLL violation. I am not sure if it is related to proxy or not. Any help would be appreciated.
code:
import slack
import ssl
SLACK_API_TOKEN = "xoxb-xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx"
client = slack.WebClient(token=SLACK_API_TOKEN)
response = client.chat_postMessage(
channel='#channelName',
text="testing")
if (response["ok"]):
print("Notification sent to Slack")
Error:
Traceback (most recent call last):
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1346, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1285, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1331, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1280, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1040, in _send_output
self.send(msg)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 980, in send
self.connect()
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/http/client.py", line 1454, in connect
self.sock = self._context.wrap_socket(self.sock,
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 500, in wrap_socket
return self.sslsocket_class._create(
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 1040, in _create
self.do_handshake()
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/ssl.py", line 1309, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:1129)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/Users/sai/Downloads/Gen3/POC_TEST.py", line 19, in <module>
response = client.chat_postMessage(
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_client.py", line 2091, in chat_postMessage
return self.api_call("chat.postMessage", json=kwargs)
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_base_client.py", line 167, in api_call
return self._sync_send(api_url=api_url, req_args=req_args)
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_base_client.py", line 258, in _sync_send
return self._urllib_api_call(
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_base_client.py", line 370, in _urllib_api_call
response = self._perform_urllib_http_request(url=url, args=request_args)
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_base_client.py", line 535, in _perform_urllib_http_request
raise err
File "/usr/local/lib/python3.9/site-packages/slack_sdk/web/legacy_base_client.py", line 496, in _perform_urllib_http_request
resp = opener.open(req, timeout=self.timeout) # skipcq: BAN-B310
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 517, in open
response = self._open(req, data)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 534, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 494, in _call_chain
result = func(*args)
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1389, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/usr/local/Cellar/python#3.9/3.9.10/Frameworks/Python.framework/Versions/3.9/lib/python3.9/urllib/request.py", line 1349, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error EOF occurred in violation of protocol (_ssl.c:1129)>
Thanks

can't upload > ~2GB to Google Cloud Storage

Trace below.
The relevant Python snippet:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
Which ultimately triggers (from the ssl library):
OverflowError: string longer than 2147483647 bytes
I assume there is some special configuration option I'm missing?
This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.
Help appreciated!
Full trace:
[File "/usr/src/app/gcloud/download_data.py", line 109, in *******
blob.upload_from_filename(source_path)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename
size=total_bytes)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file
client, file_obj, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload
client, stream, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload
transport, data, object_metadata, content_type)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
retry_strategy=self._retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
response = func()
File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 891, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 861, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write
return self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.
Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
Happy Hacking!

Google Cloud Storage - BrokenPipeError when uploading using Python library

I have a long-running Python script that uploads documents from MongoDB to GC Storage. Documents are first exported to local csv file and that csv file is uploaded to GC Storage.
Before the error, the script was running for about 10 hours with no problems. What caused this error?
Code where I use GCS:
client = storage.Client.from_service_account_json(KEY_JSON)
def upload_file(filepath):
bucket = client.get_bucket(BUCKET_NAME)
blob = bucket.blob(filepath)
blob.upload_from_filename(filepath)
logging.info("Uploaded file {} to GCS.", str(filepath))
Stack trace:
Traceback (most recent call last):
File "/home/leonz/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/leonz/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1107, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1152, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1103, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 899, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 869, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 594, in write
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

Categories