How to resolve blockchain dns using Python Requests - python

There are certain blockchain domains that are resolved only by blockchain dns resolvers.
For ex: http://Jstash.bazar
If you try to open this link in a browser, it wont get resolved.
But, just install the browser plugin from https://blockchain-dns.info/
and then try to open the site again, it will open up smoothly.
I want to scrap some data from this site using Python Requests (browserless) and have no idea as to how to resolve such blockchain domains.
Any help would be highly appreciated.

You could use one of their publicly available apis to resolve the domain and obtain an ip. You'll find a list of api urls in the Firefox or Chrome addon script, in common.js.
A python example,
import requests
from random import choice
def domain_ip(domain):
'''Uses bdns api to resolve domain names'''
domain = domain.split('/')[2] if '://' in domain else domain
apis = ['https://bdns.co/r/', 'https://bdns.us/r/', 'https://bdns.bz/r/']
api = choice(apis)
r = requests.get(api+domain)
if r.status_code == 200:
ip = r.text.splitlines()[0]
print("Domain: {} IP: {}".format(domain, ip))
return ip
else:
print('HTTP Error: {}'.format(r.status_code))
ip = domain_ip('http://jstash.bazar')
if ip:
r = requests.get('http://'+ip)
Domain: jstash.bazar IP: 190.115.24.114
Update, 10/20/21
Bdns is offline and I don't know if they'll be back. I searched for similar public HTTP APIs but couldn't find one that works well enough. Preferably, we can use Dnspython to query an OpenNIC server.
import dns.resolver
import requests
def domain_to_ip(domain, dns_server='159.89.120.99'):
'''Uses an OpenNIC server to resolve blockchain domains
:param domain: str Domain or URL
:param dns_server: str Optional, OpenNIC server
:raises dns.resolver.NXDOMAIN: if `dns_server` fails to resolve `domain`
'''
if '://' in domain:
domain = domain.split('/')[2]
res = dns.resolver.Resolver()
res.nameservers = [dns_server]
answers = res.resolve(domain)
return [rdata.address for rdata in answers]
ips = domain_to_ip('http://track2.bazar')
if ips:
r = requests.get('https://'+ips[0], verify=False)
print(r)
Requires
Dnspython, https://www.dnspython.org/
an OpenNIC dns_server, https://servers.opennicproject.org/
and no SSL verification verify=False.
Many thanks to #VincentAlex for noticing the issue and proposing a solution.

Related

Python and socket - connet to specific path

I need to connect/send msg to http://localhost:8001/path/to/my/service, but I am not able to find how to do that. I know how to send if I only have localhost and 8001, but I need this specific path /path/to/my/service. There is where my service is running.
import socket
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(<full-url-to-my-service>)
s.sendall(bytes('Message', 'utf-8'))
Update
My service is running on localhost:8001/api/v1/namespaces/my_namespace/services/my_service:http/proxy. How can I connect to it with python?
As #furas told in the comments
socket is primitive object and it doesn't have specialized method for this - and you have to on your own create message with correct data. You have to learn HTTP protocol and use it to send
This is a sample snippet to send a GET request in python using requests library
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_text = requests.get(URL).text
print(response_text)
This assumes the Content-Type that GET URL produces is text. If it is json, then a minor change is required
import requests
URL = 'http://localhost:8001/path/to/my/service'
response_json = requests.get(URL).json()
print(response_json)
There are other ways to achieve the same using other good frameworks like urllib, and so on.
Here is the documentation of requests library for reference
sendall() requires bytes, so String must be encoded.
s.sendall("foobar".encode())

Why does the proxy server use my private IP?

I am trying to scrape using proxies (this proxy server is a free one from the internet); in particular I would like to use their IP, not my private one. To test my script I am trying to Access "http://whatismyipaddress.com/" to see which IP this site sees. As it turns out it will see my private IP. Can somebody tell me what's wrong here?
import requests
from fake_useragent import UserAgent
def getMyIP(proxyServer,myPrivateIP):
scrape_website = "http://whatismyipaddress.com/"
ua = UserAgent()
headers = {'User-Agent': ua.random}
try:
response = requests.get(scrape_website,headers=headers,proxies={"https":proxyServer})
except:
faultString = proxyServer + " did not work; " + "\n"
print(faultString)
return
if myPrivateIP in str(response.content):
print("They found my private IP.")
proxyServer = "http://103.250.158.23:61219"
myPrivateIP = "xxx.xxx.xxx.xxx"
getMyIP(proxyServer,myPrivateIP)
Two things:
You set an {'https': ...} proxy configuration. This means for any HTTPS requests, it will use that proxy. You're requesting an HTTP URL however, so that proxy isn't getting used. Configure an 'http' proxy instead or in addition.
If the proxy forwards your IP in an HTTP header, and the target server heeds that header, that's tough luck and nothing you can do anything about, besides using a different proxy which doesn't forward your IP. I think point 1 is more likely the issue though.

How to download data from a password protected website

I'm using request in python to try and download this file:
http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip there are 14000 such files hence why I need to automate the process. The other techniques I've found online don't seem to work. I assume due the websites they are designed for using a different authentication method. I don't know much about web development so I can't work out how this authentication works.
Edit
This is the code:
import json
import requests
from requests.auth import HTTPBasicAuth
file = open("srtm30m_bounding_boxes.json", 'r')
strjson = file.read()
x = json.loads(strjson)
filenamelist = []
url = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/N55W003.SRTMGL1.hgt.zip"
for i in range(14295):
filenamelist.append(x['features'][i]['properties']['dataFile'])
filenamelist[i] = "http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/" + filenamelist[i]
jar = requests.cookies.RequestsCookieJar()
jar.set('urs_user_already_logged', 'yes')
jar.set('_urs-gui_session','8b972449036e60e3d83a6a819b93124d')
r = requests.get(url, cookies=jar)
And this is the error I get when I run the code:
ConnectionError: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))
The simplest thing is to provide the username and password in the URL before the host, e.g.:
requests.get('http://{username}:{password}#e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(username=username, password=password, subpath=filenamelist[i]))
You can also supply the username/password as the auth parameter to get:
requests.get('http://e4ftl01.cr.usgs.gov/MEASURES/SRTMGL1.003/2000.02.11/{subpath}'.format(subpath=filenamelist[i]), auth=(username, password))
totalhack is right that https is more secure, and it seems to work on this site. This form of authentication transmits the username and password as plaintext, so anyone who can observe the http request would also be able to steal your login. https encrypts the username / password since it encrypts the entire request.

Python Requests - Use navigate site by servers IP

I want to crawl a site, however cloudflare was getting in the way. I was able to get the servers IP, so cloudflare won't bother me.
How can I utilize this in the requests library?
For example, I want to go directly to
www.example.com/foo.php, but in requests it will resolve the IP on the cloudflare network instead of the one I want it to use. How can I make it use the one I want it to use?
I would of sent in a request so the real IP with the host set as the www.example.com, but that will just give me the home page. How can I visit other links on the site?
You will have to set a custom header host with value of example.com, something like:
requests.get('http://127.0.0.1/foo.php', headers={'host': 'example.com'})
should do the trick. If you want to verify that then type in the following command (requires netcat): nc -l -p 80 and then run the above command. It will produce output in the netcat window:
GET /foo.php HTTP/1.1
Host: example.com
Connection: keep-alive
Accept-Encoding: gzip, deflate
Accept: */*
User-Agent: python-requests/2.6.2 CPython/3.4.3 Windows/8
You'd have to tell requests to fake the Host header, and replace the hostname in the URL with the IP address:
requests.get('http://123.45.67.89/foo.php', headers={'Host': 'www.example.com'})
The URL 'patching' can be done with the urlparse library:
parsed = urlparse.urlparse(url)
hostname = parsed.hostname
parsed = parsed._replace(netloc=ipaddress)
ip_url = parsed.geturl()
response = requests.get(ip_url, headers={'Host': hostname})
Demo against Stack Overflow:
>>> import urlparse
>>> import socket
>>> url = 'http://stackoverflow.com/help/privileges'
>>> parsed = urlparse.urlparse(url)
>>> hostname = parsed.hostname
>>> hostname
'stackoverflow.com'
>>> ipaddress = socket.gethostbyname(hostname)
>>> ipaddress
'198.252.206.16'
>>> parsed = parsed._replace(netloc=ipaddress)
>>> ip_url = parsed.geturl()
>>> ip_url
'http://198.252.206.16/help/privileges'
>>> response = requests.get(ip_url, headers={'Host': hostname})
>>> response
<Response [200]>
In this case I looked up the ip address dynamically.
Answer for HTTPS/SNI support: Use the HostHeaderSSLAdapter in the requests_toolbelt module:
The above solution works fine with virtualhosts for non-encrypted HTTP connections. For HTTPS you also need to pass SNI (Server Name Identification) in the TLS header which as some servers will present a different SSL certificate depending on what is passed in via SNI. Also, the python ssl libraries by default don't look at the Host: header to match the server connection at connection time.
The above provides a straight-forward for adding a transport adapter to requests that handles this for you.
Example
import requests
from requests_toolbelt.adapters import host_header_ssl
# Create a new requests session
s = requests.Session()
# Mount the adapter for https URLs
s.mount('https://', host_header_ssl.HostHeaderSSLAdapter())
# Send your request
s.get("https://198.51.100.50", headers={"Host": "example.org"})
I think the best way to send https requests to a specific IP is to add a customized resolver to bind domain name to that IP you want to hit. In this way, both SNI and host header are correctly set, and certificate verification can always succeed as web browser.
Otherwise, you will see various issue like InsecureRequestWarning, SSLCertVerificationError, and SNI is always missing in Client Hello, even if you try different combination of headers and verify arguments.
requests.get('https://1.2.3.4/foo.php', headers= {"host": "example.com", verify=True)
In addition, I tried
requests_toolbelt
pip install requests[security]
forcediphttpsadapter
all solutions mentioned here using requests with TLS doesn't give SNI support
None of them set SNI when hitting https://IP directly.
# mock /etc/hosts
# lock it in multithreading or use multiprocessing if an endpoint is bound to multiple IPs frequently
etc_hosts = {}
# decorate python built-in resolver
def custom_resolver(builtin_resolver):
def wrapper(*args, **kwargs):
try:
return etc_hosts[args[:2]]
except KeyError:
# fall back to builtin_resolver for endpoints not in etc_hosts
return builtin_resolver(*args, **kwargs)
return wrapper
# monkey patching
socket.getaddrinfo = custom_resolver(socket.getaddrinfo)
def _bind_ip(domain_name, port, ip):
'''
resolve (domain_name,port) to a given ip
'''
key = (domain_name, port)
# (family, type, proto, canonname, sockaddr)
value = (socket.AddressFamily.AF_INET, socket.SocketKind.SOCK_STREAM, 6, '', (ip, port))
etc_hosts[key] = [value]
_bind_ip('example.com', 443, '1.2.3.4')
# this sends requests to 1.2.3.4
response = requests.get('https://www.example.com/foo.php', verify=True)

urllib2 error 'Not found on Accelerator'

I have a python program that periodically checks the weather from weather.yahooapis.com, but it always throws the error: urllib.HTTPError: HTTP Error 404: Not Found on Accelerator. I have tried on two different computers with no luck, as well as changing my DNS settings. I continue to get the error. Here is my code:
#!/usr/bin/python
import time
#from Adafruit_CharLCDPlate import Adafruit_CharLCDPlate
from xml.dom import minidom
import urllib2
#towns, as woeids
towns = [2365345,2366030,2452373]
val = 1
while val == 1:
time.sleep(2)
for i in towns:
mdata = urllib2.urlopen('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
sdata = minidom.parseString(mdata)
atm = sdata.getElementsByTagName('yweather:atmosphere')[0]
current = sdata.getElementsByTagName('yweather:condition')[0]
humid = atm.attributes['humidity'].value
tempf = current.attributes['temp'].value
print(tempf)
time.sleep(8)
I can successfully access the output of the API through a web browser on the same computers that give me the error.
The problem is that you're using the IP address 206.190.43.214 rather than the hostname weather.yahooapis.com.
Even though they resolve to the same host (206.190.43.214, obviously), the name that's actually in the URL ends up as the Host: header in the HTTP request. And you can tell that this makes the difference here:
$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f'
<404 error>
$ curl 'http://weather.yahooapis.com/forecastrss?w=2365345&u=f'
<correct rss>
$ curl 'http://206.190.43.214/forecastrss?w=2365345&u=f' -H 'Host: weather.yahooapis.com'
<correct rss>
If you test the two URLs in your browser, you will see the same thing.
So, in your code, you have two choices. You can use the DNS name instead of the IP address:
mdata = urllib2.urlopen('http://weather.yahooapis.com/forecastrss?w='+str(i)+'&u=f')
… or you can use the IP address and add the Host header manually:
req = urllib2.Request('http://206.190.43.214/forecastrss?w='+str(i)+'&u=f')
req.add_header('Host', 'weather.yahooapis.com')
mdata = urllib2.urlopen(req)
There's least one other problem in your code once you fix this. You can't call minidom.parseString(mdata) when mdata is a urlopen thingy; you either need to call read() on the thingy, or use parse instead of parseString.

Categories