Python SSLError: VERSION_TOO_LOW - python

I'm having some trouble using urllib to fetch some web content on my Debian server. I use the following code to get the contents of most websites without problems:
import urllib.request as request
url = 'https://www.metal-archives.com/'
req = request.Request(url, headers={'User-Agent': "foobar"})
response = request.urlopen(req)
response.read()
However, if the website is using an older encryption protocol, the urlopen function will throw the following error:
ssl.SSLError: [SSL: VERSION_TOO_LOW] version too low (_ssl.c:748)
I have found a way to work around this problem, consisting in using an SSL context and passing it as an argument to the urlopen function, so the previous code would have to be modified:
...
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = request.urlopen(req, context=context)
...
Which will work, provided the protocol specified matches the website I'm trying to access. However, this does not seem like the best solution since:
If the site owners ever update their cryptography methods, the code will stop working
The code above will only work for this site, and I would have to create special cases for every website I visit in the entire program, since everyone could be using a different version of the protocol. That would lead to pretty messy code
The first solution I posted (the one without the ssl context) oddly seems to work on my ArchLinux machine, even though they both have the same versions of everything
Does anyone know about a generic solution that would work for every TLS version? Am I missing something here?
PS: For completeness, I will add that I'm using Debian 9, python v3.6.2, openssl v1.1.0f and urllib3 v1.22

In the end, I've opted to wrap the method call inside a try-except, so I can use the older SSL version as fallback. The final code is this:
url = 'https://www.metal-archives.com'
req = request.Request(url, headers={"User-Agent": "foobar"})
try:
response = request.urlopen(req)
except (ssl.SSLError, URLError):
# Try to use the older TLSv1 to see if we can fix the problem
context = ssl.SSLContext(ssl.PROTOCOL_TLSv1)
response = request.urlopen(req, context=context)
I have only tested this code on a dozen websites and it seems to work so far, but I'm not sure it will work every time. Also, this solution seems inefficient, since it needs two http requests, which can be very slow.
Improvements are still welcome :)

Related

Debugging a python requests module 400 error

I'm doing a post request using python and confluence REST API in order to update confluence pages via a script.
I ran into a problem which caused me to receive a 400 error in response to a
requests.put(url, data = jsonData, auth = (username, passwd), headers = {'Content-Type' : 'application/json'})
I spent some time on this to discover that the reason for it was me not supplying an incremented version when updating the content. I have managed to make my script work, but that is not the point of this question.
During my attempts to make this work, I swapped from requests to an http.client connection. Using this module, I get a lot more information regarding my error:
b'{"statusCode":400,"data":{"authorized":false,"valid":true,"allowedInReadOnlyMode":true,"errors":[],"successful":false},"message":"Must supply an incremented version when updating Content. No version supplied.","reason":"Bad Request"}'
Is there a way for me to get the same feedback information while using requests? I've turned on logging, but this kind of info is never shown.
You're looking for
requests.json()
It outputs everything the requests item returns, as a dictionary.

Python 3.6 Requests too long

I am trying to use requests to pull information from the NPI API but it is taking on average over 20 seconds to pull the information. If I try and access it via my web browser it takes less than a second. I'm rather new to this and any help would be greatly appreciated. Here is my code.
import json
import sys
import requests
url = "https://npiregistry.cms.hhs.gov/api/?number=&enumeration_type=&taxonomy_description=&first_name=&last_name=&organization_name=&address_purpose=&city=&state=&postal_code=10017&country_code=&limit=&skip="
htmlfile=requests.get(url)
data = htmlfile.json()
for i in data["results"]:
print(i)
This might be due to the response being incorrectly formatted, or due to requests taking longer than necessary to set up the request. To solve these issues, read on:
Server response formatted incorrectly
A possible issue might be that the response parsing is actually the offending line. You can check this by not reading the response you receive from the server. If the code is still slow, this is not your problem, but if this fixed it, the problem might lie with parsing the response.
In case some headers are set incorrectly, this can lead to parsing errors which prevents chunked transfer (source).
In other cases, setting the encoding manually might resolve parsing problems (source).
To fix those, try:
r = requests.get(url)
r.raw.chunked = True # Fix issue 1
r.encoding = 'utf-8' # Fix issue 2
print(response.text)
Setting up the request takes long
This is mainly applicable if you're sending multiple requests in a row. To prevent requests having to set up the connection each time, you can utilize a requests.Session. This makes sure the connection to the server stays open and configured and also persists cookies as a nice benefit. Try this (source):
import requests
session = requests.Session()
for _ in range(10):
session.get(url)
Didn't solve your issue?
If that did not solve your issue, I have collected some other possible solutions here.

Python Requests throws SSL Error on certain site

EDIT - FIXED tldr, semi-old version of python installed a couple years ago had ssl package that was not updated to handle newer SSL certificates. After updating Python and making sure the ssl package was up to date, everything worked.
I'm new to web scraping, and wanted to scrape a certain site, but for some reason I'm getting errors when using the Python's Requests package on this particular site.
I am working on secure login to scrape data from my user profile. The login address can be found here: https://secure.funorb.com/m=weblogin/loginform.ws?mod=hiscore_fo&ssl=0&expired=0&dest=
I'm just trying to perform simple tasks at this point, like printing the text from a get request. The following is my code.
import requests
req = requests.get('https://secure.funorb.com/m=weblogin/loginform.ws?mod=hiscore_fo&ssl=0&expired=0&dest=',verify=False)
print req.text
When I run this, an error is thrown:
File "/Library/Python/2.7/site-packages/requests/adapters.py", line 512, in send
raise SSLError(e, request=request)
requests.exceptions.SSLError: EOF occurred in violation of protocol (_ssl.c:590)
I've looked in this file to see what's going on. It seems the culprit is
except (_SSLError, _HTTPError) as e:
if isinstance(e, _SSLError):
raise SSLError(e, request=request)
elif isinstance(e, ReadTimeoutError):
raise ReadTimeout(e, request=request)
else:
raise
I'm not really sure how to avoid this unfortunately, I'm kind of at my debugging limit here.
My code works just fine on other secure sites, such as https://bitbucket.org/account/signin/. I've looked at a ton of solutions on stack exchange and around the net, and a lot of people claimed adding in the optional argument "verify=False" should fix these types of SSL errors (ableit it's not the most secure way to do it). But as you can see from my code snippet, this isn't helping me.
If anyone can get this working/give advice on where to go it would be much appreciated.
... lot of people claimed adding in the optional argument "verify=False" should fix these types of SSL errors
adding verify=False helps against errors when validating the certificate, but not against EOF from server, handshake errors or similar.
As can be seen from SSLLabs this specific server exhibits the behavior of simply closing the connection (i.e. "EOF occurred in violation of protocol") for clients which don't support TLS 1.2 with modern ciphers. While you don't specify which SSL version you use I expect it to be a version less than OpenSSL 1.0.1, the first version of OpenSSL supporting TLS 1.2.
Please check ssl.OPENSSL_VERSION for the version used in your code. If I'm correct your only fix is to upgrade the version of OpenSSL use by Python. How this is done depends on your platform but there are existing posts about it, like Updating openssl in python 2.7.
Seen it somewhere else. What if you try using sessions like this:
import requests
sess = requests.Session()
adapter = requests.adapters.HTTPAdapter(max_retries = 20)
sess.mount('http://', adapter)
Then, change requests.get() with sess.get()
If you want to keep working with requests, maybe you need to install ndg-httpsclient package.

Using certifi module with urllib2?

I'm having trouble downloading https pages with the urllib2 module, which seems to result from urllib2's inability to access the system's certificate store.
To get around this issue, one possible solution is to download https web pages with pycurl, by using the certifi module. The following is an example of doing so:
def download_web_page_with_curl(url_website):
from pycurl import Curl, CAINFO, URL
from certifi import where
from cStringIO import StringIO
response = StringIO()
curl = Curl()
curl.setopt(CAINFO, where())
curl.setopt(URL, url_website)
curl.setopt(curl.WRITEFUNCTION, response.write)
curl.perform()
curl.close()
return response.getvalue()
Is there a way to use certifi with urllib2 (in a fashion comparable to the pycurl example above), which will permit me to download https sites? Alternatively, is there another feasible urllib2-based workaround which will remedy the permissions issue, without compromising security?
Would recommend using requests per my other answer. However, to answer the original question of how to do this with urllib2:
import urllib2
import certifi
def download_web_page_with_urllib2(url_website):
t = urllib2.urlopen(url_website, cafile=certifi.where())
return t.read()
text = download_web_page_with_urllib2('https://www.google.com/')
The same recommendations about error checking apply.
Expanding on the comment to use requests (which is built on urllib3):
def download_web_page_with_requests(url_website):
import requests
r = requests.get(url_website)
return r.text
This is so much easier than anything else and properly handles SSL verification independent of the platform's own cert lists. If certifi is found, requests will automatically use it. If not, it silently falls back to a more limited, possibly older set of built-in root certs. If ensuring that certifi is used matters to you, you can do this:
r = requests.get(url_website, verify=certifi.where())
Note that the above code does not do the error checking that you should probably do. So I'll point out that requests.get() can throw a number of exceptions for invalid ULRs, unreachable sites, communication errors, and failed certification validation, so you should be prepared to catch and deal with those. If it does successfully talk to a server, but the server returns a non-OK status code (such as for a non-existent page), then an exception won't be thrown, so you'd also want to check that r.status_code==200.

Interfacing to the LinkPoint API with Python - Sending XML Over SSL with Authentication

I'm trying to make a successful connection to the LinkPoint gateway using Python. For those of you unfamiliar with their API you get a .pem file you use for authentication purposes.
I'm having trouble using this file and creating a secure connection over SSL.
According to their API documentation (which leaves a lot to be desired, btw) I believe the configuration should look similar to below:
HOST = 'secure.linkpt.net'
API_URL = 'https://secure.linkpt.net/lpc/servlet/lppay'
PORT = 1129
cert_key = my_cert_key.pem
Using this information and a valid XML string how can I create this connection?
I'm pretty new to HTTP connections in Python. I've successfully implemented connections with other APIs using a POST with urllib2. Naturally, my first attempt started with a similar approach hoping I could stumble on to a solution.
Something like:
headers = { 'User-Agent' : 'Rico',
'Content-type' : 'text/xml; charset=\"UTF-8\"',
'Content-length' : len(self.xml_string),
}
# POST to First Data (Link Point)
req = urllib2.Request(API_URL, self.xml_string, headers)
response = urllib2.urlopen(req)
self.handleResponse(response.read())
I had little hopes this would work as I didn't provide anything about the cert_key or the PORT.
After this attempt I tried to use a similar approach as I found from a solution from another stackoverflow post. Unfortunately I wasn't able to get far with this as I don't have ca_certs or cert files (that I know of).
I've tried to use Requests but can't find the documentation/examples for me to make sense of it.
I've also tried to use Twisted, and I really hoped I could do something with this but this feels like trying to open a door with a wrecking ball. It just feels like overkill to me. I just need a simple connection/request/response...this seems overly complicated for that.
My next attempt was going to be PycURL, but have confronted enough despair during this process I thought I'd come here to see if someone had some good suggestions before diving into this.
If you think I should re-visit one of these tools please let me know. I didn't spend a great deal of time with any of these - just enough to get my feet wet. If you could also point me to a good example or detailed documentation that would be fantastic.
Also, I'd prefer not to use the standard SSL library to build the connection myself - I don't want to reinvent the wheel if I don't have to.
The solution I was able to use to get a valid connection was using httplib as follows:
import httplib
HOST = 'staging.linkpt.net'
API_URL = '/lpc/servlet/lppay'
PORT = 1129
CERTFILE = 'my_cert_file.pem'
headers = { 'User-Agent' : 'Rico',
'Content-type' : 'text/xml; charset=\"UTF-8\"',
'Content-length' : len(xml_str),
}
conn = httplib.HTTPSConnection(HOST, PORT, cert_file = CERTFILE)
conn.putrequest("POST", API_URL)
conn.putheader(headers)
conn.endheaders()
conn.send(xml_str)
response = conn.getresponse()
I have yet been able to generate a valid request. Apparently I interpreted the API documentation incorrectly and keep getting a Malformed or unrecognized request. but at least I'm making the connection.
I'll update this answer if I'm able to determine more useful information regarding this subject.
UPDATE: A Link Point customer service employee told me I was using old API documentation. I've since tried with the newer version and still cannot connect. I can't even get a response from their server. This is no longer a possible solution to this problem.
UPDATE 2: I was able to solve this problem in another post SSL Connection Using .pem Certificate With Python
Enjoy!

Categories