Why does the proxy server use my private IP? - python

I am trying to scrape using proxies (this proxy server is a free one from the internet); in particular I would like to use their IP, not my private one. To test my script I am trying to Access "http://whatismyipaddress.com/" to see which IP this site sees. As it turns out it will see my private IP. Can somebody tell me what's wrong here?
import requests
from fake_useragent import UserAgent
def getMyIP(proxyServer,myPrivateIP):
scrape_website = "http://whatismyipaddress.com/"
ua = UserAgent()
headers = {'User-Agent': ua.random}
try:
response = requests.get(scrape_website,headers=headers,proxies={"https":proxyServer})
except:
faultString = proxyServer + " did not work; " + "\n"
print(faultString)
return
if myPrivateIP in str(response.content):
print("They found my private IP.")
proxyServer = "http://103.250.158.23:61219"
myPrivateIP = "xxx.xxx.xxx.xxx"
getMyIP(proxyServer,myPrivateIP)

Two things:
You set an {'https': ...} proxy configuration. This means for any HTTPS requests, it will use that proxy. You're requesting an HTTP URL however, so that proxy isn't getting used. Configure an 'http' proxy instead or in addition.
If the proxy forwards your IP in an HTTP header, and the target server heeds that header, that's tough luck and nothing you can do anything about, besides using a different proxy which doesn't forward your IP. I think point 1 is more likely the issue though.

Related

How to resolve blockchain dns using Python Requests

There are certain blockchain domains that are resolved only by blockchain dns resolvers.
For ex: http://Jstash.bazar
If you try to open this link in a browser, it wont get resolved.
But, just install the browser plugin from https://blockchain-dns.info/
and then try to open the site again, it will open up smoothly.
I want to scrap some data from this site using Python Requests (browserless) and have no idea as to how to resolve such blockchain domains.
Any help would be highly appreciated.
You could use one of their publicly available apis to resolve the domain and obtain an ip. You'll find a list of api urls in the Firefox or Chrome addon script, in common.js.
A python example,
import requests
from random import choice
def domain_ip(domain):
'''Uses bdns api to resolve domain names'''
domain = domain.split('/')[2] if '://' in domain else domain
apis = ['https://bdns.co/r/', 'https://bdns.us/r/', 'https://bdns.bz/r/']
api = choice(apis)
r = requests.get(api+domain)
if r.status_code == 200:
ip = r.text.splitlines()[0]
print("Domain: {} IP: {}".format(domain, ip))
return ip
else:
print('HTTP Error: {}'.format(r.status_code))
ip = domain_ip('http://jstash.bazar')
if ip:
r = requests.get('http://'+ip)
Domain: jstash.bazar IP: 190.115.24.114
Update, 10/20/21
Bdns is offline and I don't know if they'll be back. I searched for similar public HTTP APIs but couldn't find one that works well enough. Preferably, we can use Dnspython to query an OpenNIC server.
import dns.resolver
import requests
def domain_to_ip(domain, dns_server='159.89.120.99'):
'''Uses an OpenNIC server to resolve blockchain domains
:param domain: str Domain or URL
:param dns_server: str Optional, OpenNIC server
:raises dns.resolver.NXDOMAIN: if `dns_server` fails to resolve `domain`
'''
if '://' in domain:
domain = domain.split('/')[2]
res = dns.resolver.Resolver()
res.nameservers = [dns_server]
answers = res.resolve(domain)
return [rdata.address for rdata in answers]
ips = domain_to_ip('http://track2.bazar')
if ips:
r = requests.get('https://'+ips[0], verify=False)
print(r)
Requires
Dnspython, https://www.dnspython.org/
an OpenNIC dns_server, https://servers.opennicproject.org/
and no SSL verification verify=False.
Many thanks to #VincentAlex for noticing the issue and proposing a solution.

Python urllib2 proxy setting issue

I've looked through some of the other posts on this and I hope I'm not duplicating, but I'm stuck on a real headscratcher with setting a proxy server for urllib2. I'm running the below:
file, site = argv
uri = 'https://'+site
http_proxy_server = "http://newyork.wonderproxy.com"
http_proxy_port = "11001"
http_proxy_user = "user"
http_proxy_passwd = "password"
http_proxy_full_auth_string = "http://%s:%s#%s:%s" % (http_proxy_user,
http_proxy_passwd,
http_proxy_server,
http_proxy_port)
proxy_handler = urllib2.ProxyHandler({"http": http_proxy_full_auth_string})
opener = urllib2.build_opener(proxy_handler)
urllib2.install_opener(opener)
html = opener.open(uri).read()
print html, 'it opened!'
I'm running this against an IP info site, but try as I might the response always comes out with my non-proxy IP address. When I manually set my proxy through system settings I do get a different response, so I've confirmed it's not an issue with the proxy criteria itself.
Any help that could be offered would be much appreciated!
Well this is a bit silly, but I tried a different example and my connection is working fine now.
import urllib2
proxlist= ['minneapolis.wonderproxy.com', 'newyork.wonderproxy.com']
ports = [0,1,2,3]
for prox in proxlist:
for port in ports:
proxy = urllib2.ProxyHandler({'http': 'http://user:password#%s:1100%s'%(prox,port)})
auth = urllib2.HTTPBasicAuthHandler()
opener = urllib2.build_opener(proxy, auth, urllib2.HTTPHandler)
urllib2.install_opener(opener)
try:
conn = urllib2.urlopen('http://www.howtofindmyipaddress.com/')
return_str = conn.read()
str_find = '<span style="font-size: 80px; color: #22BB22; font-family: Calibri,Arial;">'
strt = return_str.find(str_find)+len(str_find)
print prox, port, return_str[strt:return_str.find('</span',strt)-1]
except urllib2.URLError:
print prox, port, 'That\'s a no go'
Only difference I can see is that the second one used HTTPHandler instead of Proxy, as I have an apparently solution I'm not too worried, but woudl still be interested to know why I had this issue in the first place.
Your question sets the proxy URL to
http://user:password#http://newyork.wonderproxy.com:11001
which isn't valid. If you changed http_proxy_server to newyork.wonderproxy.com then your first solution might work better.

Python Requests - Use navigate site by servers IP

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.
How can I utilize this in the requests library?
For example, I want to go directly to
www.example.com/foo.php, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?
I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?
You will have to set a custom header host with value of example.com, something like:
requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:
GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:
requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
The URL 'patching' can be done with the urlparse library:
parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()
response = requests.get(ip_url, headers={'Host': hostname})
Demo against Stack Overflow:
>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response
<Response [200]>
In this case I looked up the ip address dynamically.
Answer for HTTPS/SNI support: Use the HostHeaderSSLAdapter in the requests_toolbelt module:
The above solution works fine with virtualhosts for non-encrypted HTTP connections. For HTTPS you also need to pass SNI (Server Name Identification) in the TLS header which as some servers will present a different SSL certificate depending on what is passed in via SNI. Also, the python ssl libraries by default don't look at the Host: header to match the server connection at connection time.
The above provides a straight-forward for adding a transport adapter to requests that handles this for you.
Example
import requests
from requests_toolbelt.adapters import host_header_ssl
# Create a new requests session
s = requests.Session()
# Mount the adapter for https URLs
s.mount('https://', host_header_ssl.HostHeaderSSLAdapter())
# Send your request
s.get("https://198.51.100.50", headers={"Host": "example.org"})
I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.
Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.
requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)
In addition, I tried
requests_toolbelt
pip install requests[security]
forcediphttpsadapter
all solutions mentioned here using requests with TLS doesn't give SNI support
None of them set SNI when hitting https://IP directly.
# mock /etc/hosts
# lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
etc_hosts = {}
# decorate python built-in resolver
def custom_resolver(builtin_resolver):
def wrapper(*args, **kwargs):
try:
return etc_hosts[args[:2]]
except KeyError:
# fall back to builtin_resolver for endpoints not in etc_hosts
return builtin_resolver(*args, **kwargs)
return wrapper
# monkey patching
socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
def _bind_ip(domain_name, port, ip):
'''
resolve (domain_name,port) to a given ip
'''
key = (domain_name, port)
# (family, type, proto, canonname, sockaddr)
value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
etc_hosts[key] = [value]
_bind_ip('example.com', 443, '1.2.3.4')
# this sends requests to 1.2.3.4
response = requests.get('https://www.example.com/foo.php', verify=True)

How can I open a website with urllib via proxy in Python?

I have this program that check a website, and I want to know how can I check it via proxy in Python...
this is the code, just for example
while True:
try:
h = urllib.urlopen(website)
break
except:
print '['+time.strftime('%Y/%m/%d %H:%M:%S')+'] '+'ERROR. Trying again in a few seconds...'
time.sleep(5)
By default, urlopen uses the environment variable http_proxy to determine which HTTP proxy to use:
$ export http_proxy='http://myproxy.example.com:1234'
$ python myscript.py # Using http://myproxy.example.com:1234 as a proxy
If you instead want to specify a proxy inside your application, you can give a proxies argument to urlopen:
proxies = {'http': 'http://myproxy.example.com:1234'}
print("Using HTTP proxy %s" % proxies['http'])
urllib.urlopen("http://www.google.com", proxies=proxies)
Edit: If I understand your comments correctly, you want to try several proxies and print each proxy as you try it. How about something like this?
candidate_proxies = ['http://proxy1.example.com:1234',
'http://proxy2.example.com:1234',
'http://proxy3.example.com:1234']
for proxy in candidate_proxies:
print("Trying HTTP proxy %s" % proxy)
try:
result = urllib.urlopen("http://www.google.com", proxies={'http': proxy})
print("Got URL using proxy %s" % proxy)
break
except:
print("Trying next proxy in 5 seconds")
time.sleep(5)
Python 3 is slightly different here. It will try to auto detect proxy settings but if you need specific or manual proxy settings, think about this kind of code:
#!/usr/bin/env python3
import urllib.request
proxy_support = urllib.request.ProxyHandler({'http' : 'http://user:pass#server:port',
'https': 'https://...'})
opener = urllib.request.build_opener(proxy_support)
urllib.request.install_opener(opener)
with urllib.request.urlopen(url) as response:
# ... implement things such as 'html = response.read()'
Refer also to the relevant section in the Python 3 docs
Here example code guide how to use urllib to connect via proxy:
authinfo = urllib.request.HTTPBasicAuthHandler()
proxy_support = urllib.request.ProxyHandler({"http" : "http://ahad-haam:3128"})
# build a new opener that adds authentication and caching FTP handlers
opener = urllib.request.build_opener(proxy_support, authinfo,
urllib.request.CacheFTPHandler)
# install it
urllib.request.install_opener(opener)
f = urllib.request.urlopen('http://www.google.com/')
"""
For http and https use:
proxies = {'http':'http://proxy-source-ip:proxy-port',
'https':'https://proxy-source-ip:proxy-port'}
more proxies can be added similarly
proxies = {'http':'http://proxy1-source-ip:proxy-port',
'http':'http://proxy2-source-ip:proxy-port'
...
}
usage
filehandle = urllib.urlopen( external_url , proxies=proxies)
Don't use any proxies (in case of links within network)
filehandle = urllib.urlopen(external_url, proxies={})
Use proxies authentication via username and password
proxies = {'http':'http://username:password#proxy-source-ip:proxy-port',
'https':'https://username:password#proxy-source-ip:proxy-port'}
Note: avoid using special characters such as :,# in username and passwords

Problem with urllib

I wrote this code:
import urllib
proxies = {'http': 'http://112.65.135.54:8080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
and when I execute it this program works fine, and send for me source code of python.org But when i use this:
import urllib
proxies = {'http': 'http://80.176.245.196:1080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
this program does not send me the source code of python.org
What am I going to do?
hehe :d i find the answer i must use "socks" instead of "http" :
import urllib
proxies = {'socks': 'http://80.176.245.196:1080/'}
opener = urllib.FancyURLopener(proxies)
r = opener.open("http://www.python.org/")
print r.read()
this code works fine
Presumably, the first IP address and port points to a working proxy, while the second set does not (they're on private IPs so of course nobody else can check). So, speak with whoever handles your local network, and get the exact specs for IP and port of the HTTP proxy you're supposed to use!
Edit: aargh, the question had been edited to "mask" the IPs (now they're back and they're definitely not on private networks!) -- so the answer was based on that. Anyway, no need for digging now, as the OP has already discovered that one is a socks proxy, not an http proxy, and so of course can't be treated as the latter;-).

Categories