Google Cloud Storage - BrokenPipeError when uploading using Python library - python

I have a long-running Python script that uploads documents from MongoDB to GC Storage. Documents are first exported to local csv file and that csv file is uploaded to GC Storage.
Before the error, the script was running for about 10 hours with no problems. What caused this error?
Code where I use GCS:
client = storage.Client.from_service_account_json(KEY_JSON)
def upload_file(filepath):
bucket = client.get_bucket(BUCKET_NAME)
blob = bucket.blob(filepath)
blob.upload_from_filename(filepath)
logging.info("Uploaded file {} to GCS.", str(filepath))
Stack trace:
Traceback (most recent call last):
File "/home/leonz/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/home/leonz/.local/lib/python3.5/site-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1107, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1152, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1103, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 899, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 869, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 594, in write
return self._sslobj.write(data)
BrokenPipeError: [Errno 32] Broken pipe

Related

Web Scraping in HTML to using py-script

<body class="white-vertion black-bg">
<!-- Start Loader -->
<p>
<py-script>
import ssl
from urllib.request import urlopen
from bs4 import BeautifulSoup
context = ssl._create_unverified_context()
result = urlopen("https://blog.naver.com/PostList.naver?blogId=woong3164&categoryNo=0&from=postList", context=context)
bsObj = BeautifulSoup(result.read(), "html.parser")
</py-script>
</p>
I used py-script in HTML to do web scraping. However, this error occurred.
'JsException(PythonError: Traceback (most recent call last):
File "/lib/python3.10/urllib/request.py", line 1348, in do_open
h.request(req.get_method(), req.selector, req.data, headers,
File "/lib/python3.10/http/client.py", line 1282, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/lib/python3.10/http/client.py", line 1328, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/lib/python3.10/http/client.py", line 1277, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/lib/python3.10/http/client.py", line 1037, in _send_output
self.send(msg)
File "/lib/python3.10/http/client.py", line 975, in send
self.connect()
File "/lib/python3.10/http/client.py", line 1447, in connect
super().connect()
File "/lib/python3.10/http/client.py", line 941, in connect
self.sock = self._create_connection(
File "/lib/python3.10/socket.py", line 845, in create_connection
raise err
File "/lib/python3.10/socket.py", line 833, in create_connection
sock.connect(sa)
BlockingIOError: [Errno 26] Operation in progress
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 429, in eval_code
.run(globals, locals)
File "/lib/python3.10/site-packages/_pyodide/_base.py", line 300, in run
coroutine = eval(self.code, globals, locals)
File "", line 6, in
File "/lib/python3.10/urllib/request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "/lib/python3.10/urllib/request.py", line 519, in open
response = self._open(req, data)
File "/lib/python3.10/urllib/request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "/lib/python3.10/urllib/request.py", line 496, in _call_chain
result = func(*args)
File "/lib/python3.10/urllib/request.py", line 1391, in https_open
return self.do_open(http.client.HTTPSConnection, req,
File "/lib/python3.10/urllib/request.py", line 1351, in do_open
raise URLError(err) urllib.error.URLError: )'
I think this error was caused by ssl.
How can I solve this error?
Your problem is caused by using unsupported Python packages. The package urllib uses APIs (TCP Sockets) that do not exist in the browser. This is not a limitation of PyScript, no browser application can use socket-based APIs.
The solution is to use supported APIs such as fetch or pyfetch.

HTTPS request with Python standard library

UPDATE: I managed to do a request with urllib2, but I'm still wondering what is happening here.
I would like to do a HTTPS request with Python.
This works fine with the requests module, but I don't want to use external dependencies, so I'd like to use the standard library.
httplib
When I follow this example I don't get a response. I get a timeout instead. I'm out of ideas as to what would cause this.
Code:
import requests
print requests.get('https://python.org')
from httplib import HTTPSConnection
conn = HTTPSConnection('www.python.org')
conn.request('GET', '/index.html')
print conn.getresponse()
Output:
<Response [200]>
Traceback (most recent call last):
File "test.py", line 6, in <module>
conn.request('GET', '/index.html')
File "C:\Python27\lib\httplib.py", line 1069, in request
self._send_request(method, url, body, headers)
File "C:\Python27\lib\httplib.py", line 1109, in _send_request
self.endheaders(body)
File "C:\Python27\lib\httplib.py", line 1065, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 892, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 854, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1282, in connect
HTTPConnection.connect(self)
File "C:\Python27\lib\httplib.py", line 831, in connect
self.timeout, self.source_address)
File "C:\Python27\lib\socket.py", line 575, in create_connection
raise err
socket.error: [Errno 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
urllib
This fails for a different (but possibly related) reason. Code:
import urllib
print urllib.urlopen("https://python.org")
Output:
Traceback (most recent call last):
File "test.py", line 10, in <module>
print urllib.urlopen("https://python.org")
File "C:\Python27\lib\urllib.py", line 87, in urlopen
return opener.open(url)
File "C:\Python27\lib\urllib.py", line 215, in open
return getattr(self, name)(url)
File "C:\Python27\lib\urllib.py", line 445, in open_https
h.endheaders(data)
File "C:\Python27\lib\httplib.py", line 1065, in endheaders
self._send_output(message_body)
File "C:\Python27\lib\httplib.py", line 892, in _send_output
self.send(msg)
File "C:\Python27\lib\httplib.py", line 854, in send
self.connect()
File "C:\Python27\lib\httplib.py", line 1290, in connect
server_hostname=server_hostname)
File "C:\Python27\lib\ssl.py", line 369, in wrap_socket
_context=self)
File "C:\Python27\lib\ssl.py", line 599, in __init__
self.do_handshake()
File "C:\Python27\lib\ssl.py", line 828, in do_handshake
self._sslobj.do_handshake()
IOError: [Errno socket error] [SSL: UNKNOWN_PROTOCOL] unknown protocol (_ssl.c:727)
What is requests doing that makes it succeed where both of these libraries fail?
requests.get without timeout parameter mean no timeout at all.
httplib.HTTPSConnection accept parameter timeout in Python 2.6 and newer according to httplib docs. If your problem was caused by timeout, setting high enough timeout should help. Please try replacing:
conn = HTTPSConnection('www.python.org')
with:
conn = HTTPSConnection('www.python.org', timeout=300)
which will give 300 seconds (5 minutes) for processing.

Getting UrlError While running Python code for extracting Url from in Ubuntu

Below is the stack trace at the terminal end in Ubuntu
Even my anacinda is taking too much time to open (around 20 minutes)
Traceback (most recent call last):
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 1317, in do_open
encode_chunked=req.has_header('Transfer-encoding'))
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 1244, in request
self._send_request(method, url, body, headers, encode_chunked)
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 1290, in _send_request
self.endheaders(body, encode_chunked=encode_chunked)
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 1239, in endheaders
self._send_output(message_body, encode_chunked=encode_chunked)
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 1026, in _send_output
self.send(msg)
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 966, in send
self.connect()
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 1406, in connect
super().connect()
File "/home/narendra/anaconda3/lib/python3.7/http/client.py", line 938, in connect
(self.host,self.port), self.timeout, self.source_address)
File "/home/narendra/anaconda3/lib/python3.7/socket.py", line 727, in create_connection
raise err
File "/home/narendra/anaconda3/lib/python3.7/socket.py", line 716, in create_connection
sock.connect(sa)
TimeoutError: [Errno 110] Connection timed out
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "TASK_1.py", line 23, in <module>
response = urllib.request.urlopen(line,context=gcontext)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 222, in urlopen
return opener.open(url, data, timeout)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 525, in open
response = self._open(req, data)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 543, in _open
'_open', req)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 503, in _call_chain
result = func(*args)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 1360, in https_open
context=self._context, check_hostname=self._check_hostname)
File "/home/narendra/anaconda3/lib/python3.7/urllib/request.py", line 1319, in do_open
raise URLError(err)
urllib.error.URLError: <urlopen error [Errno 110] Connection timed out>
below is my code.
this code is extracting the Url data into the file
it will one by one pick URLs for the url.txt
then it will extract the all page data from that particular URL.
import urllib.request, urllib.error, urllib.parse
import io
import ssl
#localhost, 127.0.0.0/8, ::1, 10.0.0.0/8
# using readline() that reads file line by line.
file1 = open("url.txt", "r")
count = 0
gcontext = ssl.SSLContext()`
for i in range(18):
count += 1
# Getting the next line from file
line = file1.readline()
# if line is empty
# end of file is reached
if not line:
break
response = urllib.request.urlopen(line,context=gcontext)
webContent = response.read()
with io.open("file_" + str(i) + ".txt", 'w', encoding='utf-8') as f:
f.write(webContent)
f.close()

can't upload > ~2GB to Google Cloud Storage

Trace below.
The relevant Python snippet:
bucket = _get_bucket(location['bucket'])
blob = bucket.blob(location['path'])
blob.upload_from_filename(source_path)
Which ultimately triggers (from the ssl library):
OverflowError: string longer than 2147483647 bytes
I assume there is some special configuration option I'm missing?
This is possibly related to this ~1.5yr old apparently still-open issue: https://github.com/googledatalab/datalab/issues/784.
Help appreciated!
Full trace:
[File "/usr/src/app/gcloud/download_data.py", line 109, in *******
blob.upload_from_filename(source_path)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 992, in upload_from_filename
size=total_bytes)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 946, in upload_from_file
client, file_obj, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 867, in _do_upload
client, stream, content_type, size, num_retries)
File "/usr/local/lib/python3.5/dist-packages/google/cloud/storage/blob.py", line 700, in _do_multipart_upload
transport, data, object_metadata, content_type)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/upload.py", line 97, in transmit
retry_strategy=self._retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/requests/_helpers.py", line 101, in http_request
func, RequestsMixin._get_status_code, retry_strategy)
File "/usr/local/lib/python3.5/dist-packages/google/resumable_media/_helpers.py", line 146, in wait_and_retry
response = func()
File "/usr/local/lib/python3.5/dist-packages/google/auth/transport/requests.py", line 186, in request
method, url, data=data, headers=request_headers, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/requests/adapters.py", line 440, in send
timeout=timeout
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 601, in urlopen
chunked=chunked)
File "/usr/local/lib/python3.5/dist-packages/urllib3/connectionpool.py", line 357, in _make_request
conn.request(method, url, **httplib_request_kw)
File "/usr/lib/python3.5/http/client.py", line 1106, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python3.5/http/client.py", line 1151, in _send_request
self.endheaders(body)
File "/usr/lib/python3.5/http/client.py", line 1102, in endheaders
self._send_output(message_body)
File "/usr/lib/python3.5/http/client.py", line 936, in _send_output
self.send(message_body)
File "/usr/lib/python3.5/http/client.py", line 908, in send
self.sock.sendall(data)
File "/usr/lib/python3.5/ssl.py", line 891, in sendall
v = self.send(data[count:])
File "/usr/lib/python3.5/ssl.py", line 861, in send
return self._sslobj.write(data)
File "/usr/lib/python3.5/ssl.py", line 586, in write
return self._sslobj.write(data)
OverflowError: string longer than 2147483647 bytes
The issue is it is attempting to read the entire file into memory. Following the chain from upload_from_filename shows that it stats the file and then passes that in as the upload size as a single upload part.
Instead, specifying a chunk_size when creating the object will trigger it to upload in multiple parts:
# Must be a multiple of 256KB per docstring
CHUNK_SIZE = 10485760 # 10MB
blob = bucket.blob(location['path'], chunk_size=CHUNK_SIZE)
Happy Hacking!

why yowsup can not able to send request to whatsapp server?

Yowsup was running flawlessly yesterday...and today it's giving me an error.
I just download Yowsup from https://github.com/tgalal/yowsup and edit config.example file for change number and country code then fire below command in terminal
python ./yowsup-cli -c config.example -d --requestcode sms
To send request code. Please guess what could be problem?
erp#erp-desktop:~/Downloads/yowsup-master/src$ python ./yowsup-cli -c config.example -d --requestcode sms
{'Accept': 'text/json', 'User-Agent': 'WhatsApp/2.11.1 S40Version/14.26 Device/Nokia302'}
cc=91&in=9033308076&lc=US&lg=en&mcc=000&mnc=000&method=sms&id=f1b708bba17f1ce948dc979f4d7092bc&token=6ee7d82a64bb51c6f8742f30a245c7ef
Opening connection to v.whatsapp.net
Sending GET request to /v2/code?cc=91&in=9033308076&lc=US&lg=en&mcc=000&mnc=000&method=sms&id=f1b708bba17f1ce948dc979f4d7092bc&token=6ee7d82a64bb51c6f8742f30a245c7ef
Traceback (most recent call last):
File "./yowsup-cli", line 275, in <module>
result = wc.send()
File "/home/erp/Downloads/yowsup-master/src/Yowsup/Registration/v2/coderequest.py", line 60, in send
res = super(WACodeRequest, self).send(parser)
File "/home/erp/Downloads/yowsup-master/src/Yowsup/Common/Http/warequest.py", line 100, in send
return self.sendGetRequest(parser)
File "/home/erp/Downloads/yowsup-master/src/Yowsup/Common/Http/warequest.py", line 137, in sendGetRequest
self.response = WARequest.sendRequest(host, port, path, headers, params, "GET")
File "/home/erp/Downloads/yowsup-master/src/Yowsup/Common/Http/warequest.py", line 194, in sendRequest
conn.request(reqType, path, params, headers);
File "/usr/lib/python2.7/httplib.py", line 958, in request
self._send_request(method, url, body, headers)
File "/usr/lib/python2.7/httplib.py", line 992, in _send_request
self.endheaders(body)
File "/usr/lib/python2.7/httplib.py", line 954, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 814, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 776, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 1161, in connect
self.sock = ssl.wrap_socket(sock, self.key_file, self.cert_file)
File "/usr/lib/python2.7/ssl.py", line 381, in wrap_socket
ciphers=ciphers)
File "/usr/lib/python2.7/ssl.py", line 143, in __init__
self.do_handshake()
File "/usr/lib/python2.7/ssl.py", line 305, in do_handshake
self._sslobj.do_handshake()
socket.error: [Errno 104] Connection reset by peer

Categories