urllib2: https to target via http proxy - python

I am using a proxy server to connect to several target servers. Some of the target servers expect http and others expect https. My http requests work swimmingly, but urllib2 ignores the proxy handler on the https requests and sends the requests directly to the target server.
I've tried a number of different things but here is one reasonably concise attempt:
import urllib2
cookie_handler = urllib2.HTTPCookieProcessor (cookielib.LWPCookieJar())
proxies = {'http': 'http://123.456.78.9/',
'https': 'http://123.45.78.9/'}
proxy_handler = urllib2.ProxyHandler (proxies)
url_opener = urllib2.build_opener (proxy_handler, cookie_handler)
request = urllib2.Request ('https://example.com')
response = url_opener.open (request)
I understand that urllib2 has had the ability to send https requests to a proxy server since Python 2.6.3, but I can't seem to get it to work. I'm using 2.7.3.
Thanks for any advice you can offer.
UPDATE: The code above does work. I'm not certain why it wasn't working when I asked this question. Most likely, I had a typo in the https proxy URL.

Related

HTTPX | Requests Proxies Setup

I am trying to use Proxies for my WebScraping Project, which i did build with HTTPX.
However when i was setting up my proxies i still got blocked, so i tryed out if the actully work/get used. I bought some proxys from an professional website/seller, so they work just fine.
I found a website, which returns the IP, from which i am making the request.
I Tryed to test the USE of proxies like that:
import httpx
import requests
#Username:PW:Hostname
proxies = {"http://": "http://username:pw.io:hostname"}
#response = requests.get('http://ipinfo.io/json',proxies=proxies)
response = httpx.get('http://ipinfo.io/json',proxies=proxies)
print(response.text)
Both requests and httpx dont work for me, as the response always returns my real IP. How do i need to set up my Proxiex? Keep in mind, that i actually want to use HTTPX and just used requests for debugging aswell.

Python Requests Returning 401 code on 'get' method

I'm working on a webscrape function that's going to be pulling HTML data from internal (non public) servers. I have a connection through a VPN and proxy server so when I ping any public site I get code 200 no problem, but our internals are returning 401.
Heres my code:
http_str = f'http://{username}:{password}#proxy.yourorg.com:80'
proxyDict = {
'http' : http_str,
'https' : https_str,
'ftp' : https_str
}
html_text = requests.get(url, verify=True, proxies=proxyDict, auth=HTTPBasicAuth(user, pwd))
I've tried flushing my DNS server, using different certificate chains (that had a whole new list of problems). I'm using urllib3 on version 1.23 because that seemed to help with SSL errors. I've considered using a requests session but I'm not sure what that would change.
Also, the url's we're trying to access DO NOT require a log in. I'm not sure why its throwing 401 errors but the auth is for the proxy server, I think. Any help or idea are appreciated, along with questions as at this point I'm not even sure what to ask to move this along.
Edit: the proxyDict has a string with the user and pwd passed it for each type, https http fts, etc.
To use HTTP Basic Auth with your proxy, use the http://user:password#host/ syntax in any of the proxy configuration entries. See apidocs.
import requests
proxyDict = {
"http": "http://username:password#proxy.yourorg.com:80",
"https": "http://username:password#proxy.yourorg.com:80"
}
url = 'http://myorg.com/example'
response = requests.get(url, proxies=proxyDict)
If, however, you are accessing internal URLs via VPN (i.e., internal to your organization on your intranet) then you should NOT need the proxy to access them.
Try:
import requests
url = 'http://myorg.com/example'
response = requests.get(url, verify=False)

Python requests find proxy latency

I am trying to test the latency of a proxy by pinging a site while using a proxy with a login. I know requests easily supports proxies and was wondering if there was a way to ping/test latency to a site through this. I am open to other methods as well, as long as they support a proxy with a login. Here is an example of my proxy integration with requests
import requests
proxy = {'https' : 'https://USER:PASS#IP:PORT'}
requests.get('https://www.google.com/', proxy=proxy)
How can I make a program to test the latency of a proxy with a login to a site?

Http get and post request throgh proxy in python

import requests
proxies = {'http': '203.92.33.87:80'}
# Creating the session and setting up the proxies.
s = requests.Session()
s.proxies = proxies
# Making the HTTP request through the created session.
r = s.get('https://www.trackip.net/ip')
# Check if the proxy was indeed used (the text should contain the proxy IP).
print(r.text)
In above code I am expecting that print will print 203.92.33.87.
But it is printing my real public IP.
In your proxies dictionary, you only specify a proxy for protocol http. But in your s.get(), you specificy protocol https. Since there is no https key in your dictionary, no proxy is used.
If 203.92.33.87:80 is, in fact, an https proxy, then change the proxies dictionary to reflect that. On the other hand, if it is an http proxy, then change s.get() to s.get('http://...').
Also, I believe you've incorrectly specified the proxy URL. According to the documentation:
Note that proxy URLs must include the scheme

Python urllib proxy

I'm trying to fetch some urls via urllib and mechanize through my proxy.
With mechanize I try the following:
from mechanize import Browser
import re
br = Browser()
br.set_proxies({"http": "MYUSERNAME:*******#itmalsproxy.italy.local:8080"})
br.open("http://www.example.com/")
I get the following error:
httperror_seek_wrapper: HTTP Error 407: Proxy Authentication Required ( The ISA Server requires authorization to fulfill the request. Access to the Web Proxy service is denied.
As the proxy, the username and the password are correct, what could be the problem?
Maybe the proxy is using NTLM authentication?
If that is the case, you can try using the NTLM Authorization Proxy Server (see also this answer).
you might get more info from the response headers
print br.response().info()
When your web browser uses proxy server to surf the Web from within your local
network your may be required to authenticate youself to use proxy. Google ntlmaps.

Categories