SSL and NewConnectionError - python

I want to crawl a given list by the Top-1-Million from Alexa, to check which website still offers acces via http:// an do not redirect to https://.
If the webpage does not redirect to a https:// Domain, it should be written into a csv file.
The Problem occurs, when I am adding a bunch of multiple URLs. Than I get two errors:
ssl.SSLError: [SSL: SSLV3_ALERT_HANDSHAKE_FAILURE] sslv3 alert handshake failure (_ssl.c:1056
or
requests.exceptions.ConnectionError: HTTPConnectionPool(host='17ok.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 11001] getaddrinfo failed')
I have tried the opportunities mentioned in the following threads and documentation:
https://2.python-requests.org//en/latest/user/advanced/#ssl-cert-verification
Edit: the sample url: https://requestb.in raises a 404 error actually, probably does not exist even more (?)
Python Requests throwing SSLError
Python Requests: NewConnectionError
requests.exceptions.SSLError: HTTPSConnectionPool: (Caused by SSLError(SSLError(336445449, '[SSL] PEM lib (_ssl.c:3816)')))
and some other delivered solutions.
The option to set verify=False helps, when using it for few URLs, but not when using a List > 10 URLs, the program brakes. I tried my program on a Win10 machine as well as on Ubuntu 16.04.
As expected, its the same issue. I also tried the option using Sessions and installed the certificate library as sugested.
If I am just calling three pages like 'http://www.example.com', 'https://www.github.com' and 'http://www.python.org', its not a big deal and the delivered solutions. The Headache starts, when using a bunch of URLs from the Alexa List.
Here is my code, which is working, when using it for only 3-4 urls:
import requests
from requests.utils import urlparse
urls = ['http://www.example.com',
'http://bloomberg.com',
'http://github.com',
'https://requestbin.fullcontact.com/']
with open('G:\\Request_HEADER_Suite/dummy/http.csv', 'w') as f:
for url in urls:
r = requests.get(url, stream=True, verify=False)
parsed_url = urlparse(r.url)
print("URL: ", url)
print("Redirected to: ", r.url)
print("Status Code: ", r.status_code)
print("Scheme: ", parsed_url.scheme)
if parsed_url.scheme == 'http':
f.write(url + '\n')
I expect to crawl at least a list with 100 URLs. The code should write URLs which are accessible by http:// and do not redirect to https:// into a csv file or complementary database and ignore all URLs with https://.
Because it is working for few URLs, I would expectd a stable opportunity for a larger scan.
But 2 errors araise and break the program. Is it worthy to try a workaround using pytest? Any other suggestions? Thanks in advance.
EDIT:
This is a list, which will raise errors. Only for clarification, this list from a study based on the Alexa-Top-1-Million.
urls = ['http://www.example.com',
'http://bloomberg.com',
'http://github.com',
'https://requestbin.fullcontact.com/',
'http://51sole.com',
'http://58.com',
'http://9gag.com',
'http://abs-cbn.com',
'http://academia.edu',
'http://accuweather.com',
'http://addroplet.com',
'http://addthis.com',
'http://adf.ly',
'http://adhoc2.net',
'http://adobe.com',
'http://1688.com',
'http://17ok.com',
'http://17track.net',
'http://1and1.com',
'http://1tv.ru',
'http://2ch.net',
'http://360.cn',
'http://39.net',
'http://4chan.org',
'http://4pda.ru']
I double checked, the last time the errors starts with the url 17.ok.com. But I have also tried different lists with urls. Thanks for your support.

Related

Python requests: NewConnectionError, urllib3, using cert and verify attributes

So the program i am developing involves posting documents in bank DMS server. They have provided me server certificate in .cer format which i have inserted in my verify variable in code. They also provided client id and password which i have to embed in the header itself. I generated self signed client certificate and private key and gave them the client certificate in cer format and public key. Also in code i gave path of client certificate and private key in cert tuple.
Upon executing code, i am getting this error:
HTTPSConnectionPool(host='apimuat.xxxbank.com', port=9095): Max retries exceeded with url: /doc-mgmt/v1/uploadDoc (Caused by NewConnectionError('<urllib3.connection.HTTPSConnection object at 0x7fb01bd8a160>: Failed to establish a new connection: [Errno 60] Operation timed out'))
File "/Users/fpl_mayank/Documents/FPL/python-virtual-env/uploadDocApi/server.py", line 164, in main
result = requests.post(url,
File "/Users/fpl_mayank/Documents/FPL/python-virtual-env/uploadDocApi/server.py", line 189, in <module>
main()
I have tested it with 'https://postman-echo.com/post' without mentioning cert and verify just to check if my request is going through or not. it is working fine there.
This is my code snippet where i am using request functions.
url='https://apimuat.xxxbank.com:9095/doc-mgmt/v1/uploadDoc'
headers = {"Content-Type": "application/json", "client_id":"af197b22539647fba4db8b971b43e38",
"client_secret":"c1AA406e24074d8887954472C78a924"}
data = req
result = requests.post(url,
data=data,
headers=headers,
cert=('/Users/fpl_mayank/Documents/FPL/python-virtual-
env/uploadDocApi/keystore/dms_csr_certificate_self.cer','/Users/fpl_mayank/Documents/FPL/python-virtual-env/uploadDocApi/keystore/dms_private_key.key'),
verify='/Users/fpl_mayank/Documents/FPL/python-virtual-env/uploadDocApi/truststore/APIM-UAT.cer'
)
res = result.json()
In apidoc it was mentioned, 2-way SSL authentication will be implemented bw client and server. Also i have made virtual-env for this program for that matter. Please help. I am the first one to write an API using python in my company so only way to get my issue resolve is through good ol stackoverflow.
So i solved this. idk exactly what solved it but make sure when working on api's, get the endpoint's ip whitelisted from your network, as per requirement and same goes from their side too. Also i was sending formatted json request having identation and spaces so make sure to keep json in one line.

python SSL error when using request

I need to write a simple test script for rest get using python. What I have is:
import request
url = 'http://myurl......net'
headers ={'content-type':'application/xml'}
r= requests.get(url,headers=headers)
so this give me the following SSL error:
[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed(_ssl.c:590)
So, i did some research and add " verify = False" to the end of my last line of code, but not I am stuck with: : InsecureRequestsWarning, Unverified request is been made. Addig certificate verification is strongly advised."
What to do to get rid of this message?

Max retries exceeded with URL in requests

I'm trying to get the content of App Store > Business:
import requests
from lxml import html
page = requests.get("https://itunes.apple.com/in/genre/ios-business/id6000?mt=8")
tree = html.fromstring(page.text)
flist = []
plist = []
for i in range(0, 100):
app = tree.xpath("//div[#class='column first']/ul/li/a/#href")
ap = app[0]
page1 = requests.get(ap)
When I try the range with (0,2) it works, but when I put the range in 100s it shows this error:
Traceback (most recent call last):
File "/home/preetham/Desktop/eg.py", line 17, in <module>
page1 = requests.get(ap)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 383, in request
resp = self.send(prep, **send_kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 486, in send
r = adapter.send(request, **kwargs)
File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 378, in send
raise ConnectionError(e)
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='itunes.apple.com', port=443): Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8 (Caused by <class 'socket.gaierror'>: [Errno -2] Name or service not known)
Just use requests features:
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry
session = requests.Session()
retry = Retry(connect=3, backoff_factor=0.5)
adapter = HTTPAdapter(max_retries=retry)
session.mount('http://', adapter)
session.mount('https://', adapter)
session.get(url)
This will GET the URL and retry 3 times in case of requests.exceptions.ConnectionError. backoff_factor will help to apply delays between attempts to avoid failing again in case of periodic request quota.
Take a look at urllib3.util.retry.Retry, it has many options to simplify retries.
What happened here is that itunes server refuses your connection (you're sending too many requests from same ip address in short period of time)
Max retries exceeded with url: /in/app/adobe-reader/id469337564?mt=8
error trace is misleading it should be something like "No connection could be made because the target machine actively refused it".
There is an issue at about python.requests lib at Github, check it out here
To overcome this issue (not so much an issue as it is misleading debug trace) you should catch connection related exceptions like so:
try:
page1 = requests.get(ap)
except requests.exceptions.ConnectionError:
r.status_code = "Connection refused"
Another way to overcome this problem is if you use enough time gap to send requests to server this can be achieved by sleep(timeinsec) function in python (don't forget to import sleep)
from time import sleep
All in all requests is awesome python lib, hope that solves your problem.
Just do this,
Paste the following code in place of page = requests.get(url):
import time
page = ''
while page == '':
try:
page = requests.get(url)
break
except:
print("Connection refused by the server..")
print("Let me sleep for 5 seconds")
print("ZZzzzz...")
time.sleep(5)
print("Was a nice sleep, now let me continue...")
continue
You're welcome :)
I got similar problem but the following code worked for me.
url = <some REST url>
page = requests.get(url, verify=False)
"verify=False" disables SSL verification. Try and catch can be added as usual.
pip install pyopenssl seemed to solve it for me.
https://github.com/requests/requests/issues/4246
Specifying the proxy in a corporate environment solved it for me.
page = requests.get("http://www.google.com:80", proxies={"http": "http://111.233.225.166:1234"})
The full error is:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='www.google.com', port=80): Max retries exceeded with url: / (Caused by NewConnectionError(': Failed to establish a new connection: [WinError 10060] A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond'))
It is always good to implement exception handling. It does not only help to avoid unexpected exit of script but can also help to log errors and info notification. When using Python requests I prefer to catch exceptions like this:
try:
res = requests.get(adress,timeout=30)
except requests.ConnectionError as e:
print("OOPS!! Connection Error. Make sure you are connected to Internet. Technical Details given below.\n")
print(str(e))
renewIPadress()
continue
except requests.Timeout as e:
print("OOPS!! Timeout Error")
print(str(e))
renewIPadress()
continue
except requests.RequestException as e:
print("OOPS!! General Error")
print(str(e))
renewIPadress()
continue
except KeyboardInterrupt:
print("Someone closed the program")
Here renewIPadress() is a user define function which can change the IP address if it get blocked. You can go without this function.
Adding my own experience for those who are experiencing this in the future. My specific error was
Failed to establish a new connection: [Errno 8] nodename nor servname provided, or not known'
It turns out that this was actually because I had reach the maximum number of open files on my system. It had nothing to do with failed connections, or even a DNS error as indicated.
When I was writing a selenium browser test script, I encountered this error when calling driver.quit() before a usage of a JS api call.Remember that quiting webdriver is last thing to do!
i wasn't able to make it work on windows even after installing pyopenssl and trying various python versions (while it worked fine on mac), so i switched to urllib and it works on python 3.6 (from python .org) and 3.7 (anaconda)
import urllib
from urllib.request import urlopen
html = urlopen("http://pythonscraping.com/pages/page1.html")
contents = html.read()
print(contents)
just import time
and add :
time.sleep(6)
somewhere in the for loop, to avoid sending too many request to the server in a short time.
the number 6 means: 6 seconds.
keep testing numbers starting from 1, until you reach the minimum seconds that will help to avoid the problem.
It could be network config issue also. So, for that u need to re-config ur network confgurations.
for Ubuntu :
sudo vim /etc/network/interfaces
add 8.8.8.8 in dns-nameserver and save it.
reset ur network : /etc/init.d/networking restart
Now try..
Adding my own experience :
r = requests.get(download_url)
when I tried to download a file specified in the url.
The error was
HTTPSConnectionPool(host, port=443): Max retries exceeded with url (Caused by SSLError(SSLError("bad handshake: Error([('SSL routines', 'tls_process_server_certificate', 'certificate verify failed')])")))
I corrected it by adding verify = False in the function as follows :
r = requests.get(download_url + filename)
open(filename, 'wb').write(r.content)
Check your network connection. I had this and the VM did not have a proper network connection.
I had the same error when I run the route in the browser, but in postman, it works fine. It issue with mine was that, there was no / after the route before the query string.
127.0.0.1:5000/api/v1/search/?location=Madina raise the error and removing / after the search worked for me.
This happens when you send too many requests to the public IP address of https://itunes.apple.com. It as you can see caused due to some reason which does not allow/block access to the public IP address mapping with https://itunes.apple.com. One better solution is the following python script which calculates the public IP address of any domain and creates that mapping to the /etc/hosts file.
import re
import socket
import subprocess
from typing import Tuple
ENDPOINT = 'https://anydomainname.example.com/'
ENDPOINT = 'https://itunes.apple.com/'
def get_public_ip() -> Tuple[str, str, str]:
"""
Command to get public_ip address of host machine and endpoint domain
Returns
-------
my_public_ip : str
Ip address string of host machine.
end_point_ip_address : str
Ip address of endpoint domain host.
end_point_domain : str
domain name of endpoint.
"""
# bash_command = """host myip.opendns.com resolver1.opendns.com | \
# grep "myip.opendns.com has" | awk '{print $4}'"""
# bash_command = """curl ifconfig.co"""
# bash_command = """curl ifconfig.me"""
bash_command = """ curl icanhazip.com"""
my_public_ip = subprocess.getoutput(bash_command)
my_public_ip = re.compile("[0-9.]{4,}").findall(my_public_ip)[0]
end_point_domain = (
ENDPOINT.replace("https://", "")
.replace("http://", "")
.replace("/", "")
)
end_point_ip_address = socket.gethostbyname(end_point_domain)
return my_public_ip, end_point_ip_address, end_point_domain
def set_etc_host(ip_address: str, domain: str) -> str:
"""
A function to write mapping of ip_address and domain name in /etc/hosts.
Ref: https://stackoverflow.com/questions/38302867/how-to-update-etc-hosts-file-in-docker-image-during-docker-build
Parameters
----------
ip_address : str
IP address of the domain.
domain : str
domain name of endpoint.
Returns
-------
str
Message to identify success or failure of the operation.
"""
bash_command = """echo "{} {}" >> /etc/hosts""".format(ip_address, domain)
output = subprocess.getoutput(bash_command)
return output
if __name__ == "__main__":
my_public_ip, end_point_ip_address, end_point_domain = get_public_ip()
output = set_etc_host(ip_address=end_point_ip_address, domain=end_point_domain)
print("My public IP address:", my_public_ip)
print("ENDPOINT public IP address:", end_point_ip_address)
print("ENDPOINT Domain Name:", end_point_domain )
print("Command output:", output)
You can call the above script before running your desired function :)
My situation is rather special. I tried the answers above, none of them worked. I suddenly thought whether it has something to do with my Internet proxy? You know, I'm in mainland China, and I can't access sites like google without an internet proxy. Then I turned off my Internet proxy and the problem was solved.
In my case, I am deploying some docker containers inside the python script and then calling one of the deployed services. Error is fixed when I add some delay before calling the service. I think it needs time to get ready to accept connections.
from time import sleep
#deploy containers
#get URL of the container
sleep(5)
response = requests.get(url,verify=False)
print(response.json())
First I ran the run.py file and then I ran the unit_test.py file, it works for me
Add headers for this request.
headers={
'Referer': 'https://itunes.apple.com',
'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/75.0.3770.142 Safari/537.36'
}
requests.get(ap, headers=headers)
I am coding a test with Gauge and I encountered this error as well, it was because I was trying to request an internal URL without activating VPN.

Python requests chokes on a UTF-8 filename

I'm trying to upload a file, using the requests library to submit a POST.
This works fine:
theFile = { 'LUuploadFile': ("linea.ipa", open(path_to_file, 'rb'), 'application/octet-stream') }
request = requests.post(url, files=theFile)
This throws an error:
theFile = { 'LUuploadFile': ("línea.ipa", open(path_to_file, 'rb'), 'application/octet-stream') }
request = requests.post(url, files=theFile)
The error is very odd:
( <class 'requests.exceptions.ConnectionError'>,
ConnectionError(MaxRetryError("HTTPSConnectionPool(host='fupload.apperian.com', port=443):
Max retries exceeded with url: /upload?transactionID=...
(Caused by <class 'socket.error'>: [Errno 32] Broken pipe)",),),
<traceback object at 0x100a8e3f8>)
It's not the server, it accepts the filename if I use curl:
curl --form "LUuploadFile=#línea.ipa" http://...
This means that something in the particular server doesn't implement the parsing of Content-Disposition correctly (according to RFC 5987). I can't be more specific than that since there any many "moving parts" to a web application server (for example you might be using nginx + fastcgi + PHP) and any one (or all :)) of those might be broken. You might find this SO thread as well as this page useful, which approaches the issue from the other side (downloading the file with an UTF-8 name), but boils down to the same issue (parsing the "Content-Disposition" header).
For what it's worth requests is doing the "correct" thing (according to the standard), but there isn't really much it can do if some component on the server doesn't follow the standard (or it might not even be on the server - for example there might be a proxy you're passing trough that is causing the issue).

Python requests: sending file via POST returns ConnectionError

I'm trying to use the Python requests library to send an android .apk file to a API service. I've successfully used requests and this file type to submit to another service but I keep getting a:
ConnectionError(MaxRetryError("HTTPSConnectionPool(host='REDACTED', port=443): Max retries exceeded with url: /upload/app (Caused by : [WinError 10054] An existing connection was forcibly closed by the remote host)",),)
This is the code responsible:
url = "https://website"
files = {'file': open(app, 'rb')}
headers = {'user':'value', 'pass':'value'}
try:
response = requests.post(url, files=files, headers=headers)
jsonResponse = json.loads(response.text)
if 'error' in jsonResponse:
logger.error(jsonResponse['error'])
except Exception as e:
logger.error("Exception when trying to upload app to host")
The response line is throwing the above mentioned exception. I've used these exact same parameters using the Chrome Postman extension to replicate the POST request and it works perfectly. I've used the exact same format of file to upload to another RESTful service as well. The only difference between this request and the one that works is that this one has custom headers attached in order to verify the POST. The API doesn't stipulate this as authentication in the sense of needing to be encoded and the examples both in HTTP and cURL define these values as headers or -H.
Any help would be most appreciated!
So this was indeed a certificates issue. In my case I was able to stay internal to my company and connect to another URL, but the requests library, which is quite amazing, has information on certs at: http://docs.python-requests.org/en/latest/user/advanced/?highlight=certs
For all intents and purposes this is answered but perhaps it will be useful to someone in posterity.

Categories