Python proxy with authentification (request,pytrends...) - python

Hello I was previously using older packages and in our corporate I was able to reach outside with proxy with Python in a way:
username = "Platform\\PersonalID"
proxies = {
"http" : "http://username:password#adress:port",
"https" : "http://username:password#adress:port",
}
with for example:
requests.get(url, proxies=proxies)
However, when I try this on new packages I get a problem with the connection to the proxy and get a timeout error from the urllib3:
"Connection to Platform timed out"
I was able to track it to some changes in urllib3 when I was comparing the old and new packages, but can't figure out how to feed the proxy settings to the requests, pytrends and other packeges (in fact to the new urllib3 which the requests and other libraries are build on)
I have tried to search throught web was looking for several ways how to get the proxy working but failed in one way or another.
for example
If I use the proxies without the password and try to pass the password as an auth=("username","password") then I get error 407 authentication required. Even though the password and ID is correct.
Example:
import requests
proxies = {"http":"http://Platform\\userID:password#adress:port",
"https": "http://Platform\\userID:password#adress:port"
}
url="https://www.google.com/"
requests.get(url, proxies=proxies).content
works with:
urllib3 up to 1.24
doesnt work with:
urllib3 1.25 (Unable to parse the proxyAdress)
urllib3 1.26.13 (TimeoutError, trying to connect to badly parsed proxyAdress is my tip?)

Related

Getting a 401 response while using Requests package

I am trying to access a server over my internal network under https://prodserver.de/info.
I have the code structure as below:
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password))
print(resp.status_code)
While trying to access this server via browser, it works perfectly fine.
What am I doing wrong?
By default, requests library verifies the SSL certificate for HTTPS requests. If the certificate is not verified, it will raise a SSLError. You check this by disabling the certificate verification by passing verify=False as an argument to the get method, if this is the issue.
import requests
from requests.auth import *
username = 'User'
password = 'Hello#123'
resp = requests.get('https://prodserver.de/info/', auth=HTTPBasicAuth(username,password), verify=False)
print(resp.status_code)
try using requests' generic auth, like this:
resp = requests.get('https://prodserver.de/info/', auth=(username,password)
What am I doing wrong?
I can not be sure without investigating your server, but I suggest checking if assumption (you have made) that server is using Basic authorization, there exist various Authentication schemes, it is also possible that your server use cookie-based solution, rather than headers-based one.
While trying to access this server via browser, it works perfectly
fine.
You might then use developer tools to see what is actually send inside and with request which does result in success.

SSL Context for older python version

I have a code as below :
headers = {'content-type': 'ContentType.APPLICATION_XML'}
uri = "www.client.url.com/hit-here/"
clientCert = "path/to/cert/abc.crt"
clientKey = "path/to/key/abc.key"
PROTOCOL = ssl.PROTOCOL_TLSv1
context = ssl.SSLContext(PROTOCOL)
context.load_default_certs()
context.load_cert_chain(clientCert, clientKey)
conn = httplib.HTTPSConnection(uri, some_port, context=context)
I am not really a network programmer, so i did some googling for handshake connection and found ssl.SSLContext(PROTOCOL) as the needed function, code works fine.
Then i hit the roadblock, my local has version 2.7.10 but all the production boxes have 2.7.3 with them, so SSLContext is not supported and upgrading python version is not an option / in control.
I tried reading ssl — SSL wrapper for socket objects but couldn't make sense out of it.
what i tried (in vain) :
s_ = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s = ssl.wrap_socket(s_, keyfile=clientKey, certfile=clientCert, cert_reqs=ssl.CERT_REQUIRED)
new_conn = s.connect((uri, some_port))
but returns :
SSLError(1, u'[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:590)')
Question - how to generate SSL Context on older version so as to have a secure https connection?
You have to specify the ca_certs file (which should point to trust store)
I've got the perfect solution using the requests library. The requests library has got to be my favorite library I've ever used, cause it takes something in Python that is inherently difficult to do -- SSL and REST requests -- and makes it unbelievably simple. I checked out their version support and Python 2.6+ is supported.
Here is an example of how to use their library.
>>> requests.get(uri)
And that is all you have to do. The requests library takes care of establishing a ssl connection.
Taking this one step farther. If you need to persist cookies between requests, you can do so like this.
>>> sess = requests.Session()
>>> credentials = {"username": "user",
"password": "pass"}
>>> sess.post("https://some-website/login", params=credentials)
<Response [200]>
>>> sess.get("https://some-website/a-backend-page").text
<html> the backend page... </html>
Edit: If you need to, you can also pass in the path to the certificate and the key like so requests.get(uri, cert=('path/to/cert/abc.crt', 'path/to/key/abc.key'))
Now hopefully you can convince them to install the requests library on the production boxes, cause it would be well worth it. Let me know if this works out for you.

How to read JSON from URL in Python?

I am trying to use Python to get a JSON file from the Web. If I open the URL in my browser (Mozilla or Chromium) I do see the JSON. But when I do the following with the Python:
response = urllib2.urlopen(url)
data = json.loads(response.read())
I get an error message that tells me the following (after translation in English): Errno 10060, a connection troughs an error, since the server after a certain time period did not react, or the connection was erroneous, or the host did not react.
ADDED
It looks like there are many people who faced the described problem. There are also some answers to the similar (or the same) question. For example here we can see the following solution:
import requests
r = requests.get("http://www.google.com", proxies={"http": "http://61.233.25.166:80"})
print(r.text)
It is already a step forward for me (I think that it is very likely that the proxy is the reason of the problem). However, I still did not get it done since I do not know URL of my proxy and I probably will need user name and password. Howe can I find them? How did it happen that my browsers have them I do not?
ADDED 2
I think I am now one step further. I have used this site to find out what my proxy is: http://www.whatismyproxy.com/
Then I have used the following code:
proxies = {'http':'my_proxy.blabla.com/'}
r = requests.get(url, proxies = proxies)
print r
As a result I get
<Response [404]>
Looks not so good, but at least I think that my proxy is correct, because when I randomly change the address of the proxy I get another error:
Cannot connect to proxy
So, I can connect to proxy but something is not found.
I think there might be something wrong, when you're trying to get the json from the online source(URL). Just to make things clear, here is a small code snippet
#!/usr/bin/env python
try:
# For Python 3+
from urllib.request import urlopen
except ImportError:
# For Python 2
from urllib2 import urlopen
import json
def get_jsonparsed_data(url):
response = urlopen(url)
data = str(response.read())
return json.loads(data)
If you still get a connection error, You can try a couple of steps:
Try to urlopen() a random site from the Interpreter (Interactive Mode). If you are able to grab the source code you're good. If not check internet conditions or try the request module. Check here
Check and see if the json in the URL is in the correct syntax. For sample json syntax check here
Try the simplejson module.
Edit 1:
if you want to access websites using a system wide proxy you will have to use a proxy handler to use loopback(local host) to connect to that proxy.. A sample code is shown below.
proxy = urllib2.ProxyHandler({
'http': '127.0.0.1',
'https': '127.0.0.1'
})
opener = urllib2.build_opener(proxy)
urllib2.install_opener(opener)
# this way you can send both http and https request using proxies
urllib2.urlopen('http://www.google.com')
urllib2.urlopen('https://www.google.com')
I have not not worked a lot with ProxyHandler. I just know the theory and code. I am sure there are better ways to access websites through proxies; One which does not involve installing the opener everytime you run the program. But hopefully it will point you in the right direction.

Access HTTPS / :443 through VPN in this Python urllib.request library, when blocked

I am using gh-issues-import to migrate issues between GitHub and a GitHub Enterprise server. The problem I have is, our GHE requires going through a VPN proxy, while GitHubs API requires HTTPS route to access. I can only get one or the other, but having a hell of a time finding a way to access both via the same Python project using urllib.requests. Here is a scaled down script I used to utilize the library that is failing in gh-issues-import...
import urllib.request
# works through VPN (notice able to use http), requires VPN
GitHubEnterpriseurl = "http://xxxxx/api/v3/"
req = urllib.request.Request(GitHubEnterpriseurl)
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
# does not work on VPN due to https path, but fine outside of VPN
req = urllib.request.Request("https://api.github.com")
response = urllib.request.urlopen(req)
json_data = response.read()
print(json_data)
I have tried other HTTP libraries and comes down to the VPN blocking access to the https://api.github.com. What are some solutions for this? Can I create a script on another server, my VPN has access to, and simply clone the requests and route the data?
* I am able to connect to https://api.github.com using VPN through the browser (Chrome / Firefox) but when running any command line tools or this script to access it fails.

Does urllib2 in Python 2.6.1 support proxy via https

Does urllib2 in Python 2.6.1 support proxy via https?
I've found the following at http://www.voidspace.org.uk/python/articles/urllib2.shtml:
NOTE
Currently urllib2 does not support
fetching of https locations through a
proxy. This can be a problem.
I'm trying automate login in to web site and downloading document, I have valid username/password.
proxy_info = {
'host':"axxx", # commented out the real data
'port':"1234" # commented out the real data
}
proxy_handler = urllib2.ProxyHandler(
{"http" : "http://%(host)s:%(port)s" % proxy_info})
opener = urllib2.build_opener(proxy_handler,
urllib2.HTTPHandler(debuglevel=1),urllib2.HTTPCookieProcessor())
urllib2.install_opener(opener)
fullurl = 'https://correct.url.to.login.page.com/user=a&pswd=b' # example
req1 = urllib2.Request(url=fullurl, headers=headers)
response = urllib2.urlopen(req1)
I've had it working for similar pages but not using HTTPS and I suspect it does not get through proxy - it just gets stuck in the same way as when I did not specify proxy. I need to go out through proxy.
I need to authenticate but not using basic authentication, will urllib2 figure out authentication when going via https site (I supply username/password to site via url)?
EDIT:
Nope, I tested with
proxies = {
"http" : "http://%(host)s:%(port)s" % proxy_info,
"https" : "https://%(host)s:%(port)s" % proxy_info
}
proxy_handler = urllib2.ProxyHandler(proxies)
And I get error:
urllib2.URLError: urlopen error
[Errno 8] _ssl.c:480: EOF occurred in
violation of protocol
Fixed in Python 2.6.3 and several other branches:
_bugs.python.org/issue1424152 (replace _ with http...)
http://www.python.org/download/releases/2.6.3/NEWS.txt
Issue #1424152: Fix for httplib, urllib2 to support SSL while working through
proxy. Original patch by Christopher Li, changes made by Senthil Kumaran.
I'm not sure Michael Foord's article, that you quote, is updated to Python 2.6.1 -- why not give it a try? Instead of telling ProxyHandler that the proxy is only good for http, as you're doing now, register it for https, too (of course you should format it into a variable just once before you call ProxyHandler and just repeatedly use that variable in the dict): that may or may not work, but, you're not even trying, and that's sure not to work!-)
Incase anyone else have this issue in the future I'd like to point out that it does support https proxying now, make sure the proxy supports it too or you risk running into a bug that puts the python library into an infinite loop (this happened to me).
See the unittest in the python source that is testing https proxying support for further information:
http://svn.python.org/view/python/branches/release26-maint/Lib/test/test_urllib2.py?r1=74203&r2=74202&pathrev=74203

Categories