I have successfully been using the gTTS module in order to get audio from Google Translate for a while. I use it quite sparsely (I must have made 25 requests in total), and don't believe I could have hit any kind of limit that would cause my address to be blocked from using the service.
However, today, after trying to use it (I haven't used it in 1-2 months), I got the following program:
from gtts import gTTS
tts = gTTS('hallo', 'de')
tts.save('hallo.mp3')
To cause an error. I tracked down the problem, and I managed to see that even this simple program:
import requests
response = requests.get("https://translate.google.com/")
Causes the following error:
Traceback (most recent call last):
File "C:\...\lib\site-packages\urllib3\connectionpool.py", line 601, in urlopen
chunked=chunked)
File "C:\...\lib\site-packages\urllib3\connectionpool.py", line 346, in _make_request
self._validate_conn(conn)
File "C:\...\lib\site-packages\urllib3\connectionpool.py", line 850, in _validate_conn
conn.connect()
File "C:\...\lib\site-packages\urllib3\connection.py", line 326, in connect
ssl_context=context)
File "C:\...\lib\site-packages\urllib3\util\ssl_.py", line 329, in ssl_wrap_socket
return context.wrap_socket(sock, server_hostname=server_hostname)
File "C:\...\lib\ssl.py", line 407, in wrap_socket
_context=self, _session=session)
File "C:\...\lib\ssl.py", line 814, in __init__
self.do_handshake()
File "C:\...\lib\ssl.py", line 1068, in do_handshake
self._sslobj.do_handshake()
File "C:\...\lib\ssl.py", line 689, in do_handshake
self._sslobj.do_handshake()
ssl.SSLEOFError: EOF occurred in violation of protocol (_ssl.c:777)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "C:\...\lib\site-packages\requests\adapters.py", line 440, in send
timeout=timeout
File "C:\...\lib\site-packages\urllib3\connectionpool.py", line 639, in urlopen
_stacktrace=sys.exc_info()[2])
File "C:\...\lib\site-packages\urllib3\util\retry.py", line 388, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='translate.google.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:777)'),))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "main2.py", line 2, in <module>
response = requests.get("https://translate.google.com/")
File "C:\...\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\...\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\...\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\...\lib\site-packages\requests\sessions.py", line 618, in send
r = adapter.send(request, **kwargs)
File "C:\...\lib\site-packages\requests\adapters.py", line 506, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: HTTPSConnectionPool(host='translate.google.com', port=443): Max retries exceeded with url: / (Caused by SSLError(SSLEOFError(8, 'EOF occurred in violation of protocol (_ssl.c:777)'),))
I would like to know if anyone has an idea what the issue could be. I can get on the Google Translate website without any problems from my browser, and have no issues using the audio either.
Accepted answer did not work for me since the code has changed, the way i got it to work was to add verify=False in gtts_token.py instead
response = requests.get("https://translate.google.com/", verify=False)
This looks like an error related to your proxy setting, especially if you are using your work PC. I have got the same issue, but different error message, for example:
gTTSError: Connection error during token calculation:
HTTPSConnectionPool(host='translate.google.com', port=443): Max
retries exceeded with url: / (Caused by SSLError(SSLError("bad
handshake: Error([('SSL routines', 'ssl3_get_server_certificate',
'certificate verify failed')],)",),))
To further investigate the issue, you can debug it in the command line.
(base) c:\gtts-cli "sample text to debug" --debug --output test.mp3
you should see results as below;
ProxyError('Cannot connect to proxy.', OSError('Tunnel connection failed: 407 Proxy Authentication Required',)))
Solution:
I have checked the gTTs documentation, there is no way to pass your proxy setting to the api. so the work around is ignore the ssl verification, which in not available also in gTTs. so the only way to do it is to change the following gtts files:
tts.py, in line 208 chage the request function to add verifiy=false
r = requests.get(self.GOOGLE_TTS_URL,
params=payload,
headers=self.GOOGLE_TTS_HEADERS,
proxies=urllib.request.getproxies(),
verify=False)
file lang.py, line 56
page = requests.get(URL_BASE, verify=False)
Then, try again the debug command line. you should be able to get the file recorded now
(base) c:\gtts-cli "sample text to debug" --debug --output test.mp3
gtts.tts - DEBUG - status-0: 200
gtts.tts - DEBUG - part-0 written to <_io.BufferedWriter name=test.mp3'>
Related
I am trying to make a script that scrape presentation from a slideshare link and download it as a PDF.
The script is working fine, until the total slides are under 20. Is there any alternative to requests in python that can do the job.
Here is the scripts:
import requests
from bs4 import BeautifulSoup
from PIL import Image
import io
URL_LESS = "https://www.slideshare.net/angelucmex/global-warming-2373190?qid=8f04572c-48df-4f53-b2b0-0eb71021931c&v=&b=&from_search=1"
URL="https://www.slideshare.net/tusharpanda88/python-basics-59573634?qid=03cb80ee-36f0-4241-a516-454ad64808a8&v=&b=&from_search=5"
r = requests.get(URL_LESS)
soup = BeautifulSoup(r.content, "html5lib")
imgs = soup.find_all('img', class_="slide-image")
imgSRC = [x.get("srcset").split(',')[0].strip().split(' ')[0].split('?')[0] for x in imgs]
imagesJPG = []
for img in imgSRC:
im = requests.get(img)
f = io.BytesIO(im.content)
imgJPG = Image.open(f)
imagesJPG.append(imgJPG)
imagesJPG[0].save(f"{soup.title.string}.pdf",save_all=True, append_images=imagesJPG[1:])
Try changing URL_LESS to URL, you will get the idea.
Here is the traceback
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 174, in _new_conn
conn = connection.create_connection(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\connection.py", line 95, in create_connection
raise err
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\connection.py", line 85, in create_connection
sock.connect(sa)
TimeoutError: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 703, in urlopen
httplib_response = self._make_request(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 386, in _make_request
self._validate_conn(conn)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 1040, in _validate_conn
conn.connect()
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 358, in connect
conn = self._new_conn()
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connection.py", line 186, in _new_conn
raise NewConnectionError(
urllib3.exceptions.NewConnectionError: <urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\adapters.py", line 440, in send
resp = conn.urlopen(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\connectionpool.py", line 785, in urlopen
retries = retries.increment(
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\urllib3\util\retry.py", line 592, in increment
raise MaxRetryError(_pool, url, error or ResponseError(cause))
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "d:\Work\py\scrapingScripts\slideshare\main.py", line 16, in <module>
im = requests.get(img)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\api.py", line 75, in get
return request('get', url, params=params, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\api.py", line 61, in request
return session.request(method=method, url=url, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\sessions.py", line 529, in request
resp = self.send(prep, **send_kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\sessions.py", line 645, in send
r = adapter.send(request, **kwargs)
File "D:\Work\py\scrapingScripts\tkinter\env\lib\site-packages\requests\adapters.py", line 519, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='image.slidesharecdn.com', port=443): Max retries exceeded with url: /pythonbasics-160315100530/85/python-basics-8-320.jpg (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x00000259643FF820>: Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did
not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
The script worked perfectly for me both when using URL and URL_LESS, so your internet might be the culprit here.
My guesses are:
You're having a slow/inconsistent internet.
Slideshare is blacklisting your IP/ web-agent maybe for DDOS protection.(unlikely)
You're Using ipv6, which has been the culprit in these kind of cases for me, try switching your network to use ipv4 only.
and when it comes to requests, I have personally used it to scrape a fairly large amount of data for a fairly long time so I can say it's an amazing library to use
I have written some Python code for an azure function and everything works fine when I execute it locally in VS code through Azure Functions Core Tools. The code calls a REST API.
When I deploy it to azure it fails with the following error, any idea on how to debug this?
Result: Failure Exception: SSLError: HTTPSConnectionPool(host='api.myurl.net', port=443): Max retries exceeded with url: /payments/123456 (Caused by SSLError(SSLError(1, '[SSL: TLSV1_ALERT_PROTOCOL_VERSION] tlsv1 alert protocol version (_ssl.c:1125)'))) Stack: File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 355, in _handle__invocation_request call_result = await self._loop.run_in_executor( File "/usr/local/lib/python3.8/concurrent/futures/thread.py", line 57, in run result = self.fn(*self.args, **self.kwargs) File "/azure-functions-host/workers/python/3.8/LINUX/X64/azure_functions_worker/dispatcher.py", line 542, in __run_sync_func return func(**params) File "/home/site/wwwroot/HttpTrigger1/__init__.py", line 15, in main status, body, headers = client.get('/payments/234368493',raw=True) File "/home/site/wwwroot/.python_packages/lib/site-packages/quickpay_api_client/api.py", line 80, in perform response = self.fulfill(method, url, File "/home/site/wwwroot/.python_packages/lib/site-packages/quickpay_api_client/api.py", line 44, in fulfill return getattr(self.session, method)(*args, **kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/requests/sessions.py", line 555, in get return self.request('GET', url, **kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, **send_kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/requests/sessions.py", line 655, in send r = adapter.send(request, **kwargs) File "/home/site/wwwroot/.python_packages/lib/site-packages/requests/adapters.py", line 514, in send raise SSLError(e, request=request)
The problem was in the code of one of the installed python modules. The module used the poolmanager and I suspect the problem was in that. I rewrote the code and now it's working.
Your problem is caused by your certificate is not trusted, you can try to use the following ways to solve it.
1. You can refer to SSLError (bad handshake) when using Azure CLI to get the trusted certificate, you can get the trusted certificate by the URL mentioned by the error message in a browser.
2. You can try to disable certificate verification, please refer to SSL handshake error with some Azure CLI commands:
set ADAL_PYTHON_SSL_NO_VERIFY=1
set AZURE_CLI_DISABLE_CONNECTION_VERIFICATION=1
I am using the following config.hcl for my Hashicorp server,
disable_mlock = true
storage "file" {
path = "/etc/secrets"
}
listener "tcp" {
address = "10.xx.xx.xx:8200"
tls_cert_file = "/etc/certs/selfsigned.crt"
tls_key_file = "/etc/certs/selfsigned.key"
}
it is working fine when i perform vault operations,
But when i try reach it using hvac python library i am getting SSL error.
The code i am using to connect to hashicorp server from python is,
import hvac
client = hvac.Client(url='https://10.xx.xx.xx:8200', cert=('/etc/certs/selfsigned.crt', '/etc/certs/selfsigned.key'))
client.token = 'd460cb82-08aa-4b97-8655-19b6593b262d'
client.is_authenticated()
The full error trace i am getting is as follows:-
Traceback (most recent call last): File "", line 1
, in
File
"/usr/local/lib/python2.7/dist-packages/hvac/v1/init.py", line
552, in is_authenticated
self.lookup_token() File "/usr/local/lib/python2.7/dist-packages/hvac/v1/init.py", line
460, in lookup_token
return self._get('/v1/auth/token/lookup-self', wrap_ttl=wrap_ttl).json() File
"/usr/local/lib/python2.7/dist-packages/hvac/v1/init.py", line
1236, in _get
return self.request('get', url, **kwargs) File "/usr/local/lib/python2.7/dist-packages/hvac/v1/__init.py", line
1264, in __request
allow_redirects=False, **_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line
512, in request
resp = self.send(prep, **send_kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line
622, in send
r = adapter.send(request, **kwargs) File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line
511, in send
raise SSLError(e, request=request) requests.exceptions.SSLError: HTTPSConnectionPool(host='10.xx.xx.xx', port=8200): Max retries
exceeded with url: /v1/auth/token/lookup-self (Caused by
SSLError(SSLError("bad handshake: Error([('SSL routines',
'tls_process_server_certificate', 'certificate verify
failed')],)",),))
According to hvac documentation Using TLS with client-side certificate authentication, you need to specify verify=server_cert_path parameter.
Testing as below, i can get results as expected. btw with or without token parameter, it could run successfully.
import hvac
client = hvac.Client(url='https://127.0.0.1:8200',
token='xxxxxxxx',
cert=('server.crt',
'server.key'),
verify='ca.crt')
res = client.is_authenticated()
print("res:", res)
I am getting an SSL "bad handshake" error when using a python script with requests to log in to https://selfserve.publicmobile.ca/Overview/
Here's the error:
Traceback (most recent call last):
File "/home/pi/Documents/repos/private-repos/public-mobile-usage-scraping/pm_usage_scraping_to_db.py", line 55, in <module>
r = s.get(URL)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 526, in get
return self.request('GET', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 513, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 623, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 514, in send
raise SSLError(e, request=request)
SSLError: ("bad handshake: Error([('SSL routines', 'SSL3_GET_SERVER_CERTIFICATE', 'certificate verify failed')],)",)
I know that a workaround is setting verify=False but will leave me vulnerable, especially since I want to send my username and password over this.
From my research, it seems like the website is missing an intermediate certificate (see https://www.ssllabs.com/ssltest/analyze.html?d=selfserve.publicmobile.ca).
Is that true? Where/how do I find the intermediate certificate to add it to my code?
Alternatively, how bad is it if I make a non-SSL request to that site with my account credentials, considering I am only doing that from my home network?
I am new to python and web scraping, and have only a rudimentary of SSL and web security. Thank you so much for help!
s.get(***, verify=False) you try, cancal verify
I'm trying to setup a WebDAV connection using easywebdav in Python. (Using 2.7.8 for now)
import csv, easywebdav
webdav=easywebdav.connect('https://sakai.rutgers.edu/dav/restoftheurl,username="",password="")
print webdav.ls()
Though when I run this I get the following error message. My guess is that it possibly has something to do with the URL using HTTPS?
Traceback (most recent call last):
File "/home/willkara/Development/SakaiStuff/WorkProjects/sakai-manager/file.py", line 4, in <module>
print webdav.ls()
File "build/bdist.linux-x86_64/egg/easywebdav/client.py", line 176, in ls
File "build/bdist.linux-x86_64/egg/easywebdav/client.py", line 97, in _send
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python2.7/dist-packages/requests/sessions.py", line 559, in send
r = adapter.send(request, **kwargs)
File "/usr/lib/python2.7/dist-packages/requests/adapters.py", line 375, in send
raise ConnectionError(e, request=request)
requests.exceptions.ConnectionError: HTTPConnectionPool(host='https', port=80): Max retries exceeded with url: //sakai.rutgers.edu/dav/url:80/. (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
[Finished in 0.1s with exit code 1]
I find it strange that you combine HTTPS protocol and port 80. HTTPS uses port 443.
Though the error message "Name or service not known" would rather indicate that the hostname sakai.rutgers.edu is not recognized on your system. Try to ping the host.
I noticed that you shouldn't have http:// or https:// in the beginning of your adress, only the host name. You select protocol with protocol='https'. Also, I couln't get it to work if I added the path the url, I had to use it as argument to the operations like easywebdav.ls('/dav/restoftheurl') or easywebdav.cd('/dav/restoftheurl').