I am trying to implement a script which tests a tracking URL is reachable to the app on the app store or not.
For this, I need to get the IP address of country in question and using that IP I need to request the tracking URL. So it will be like request went through that country and we will come to know if URL is reachable to the app store for that country or not.
Question: How do I request the URL as if its requested from IP I provide?
Example:
def check_url_is_valid(url, ip_address):
# Trying to request url using ip_address
# return True or False
PS: Not expecting to complete this code but any direction or guidance is appreciated.
There is no way to get a "country's IP address" since there is no such thing. There are ranges of IP addresses corresponding to different ISP's in different locations. You should pass each request through a proxy that you know is located where you want it to be.
For that you will need to create your own proxy list for each country and pass your requests through proxy every time for each country. You should explore some possible free or payed proxies for that.
Anyway, once you do, sending a request through a proxy can be done like this:
proxyDict = {
"http" : "http://1.1.1.1:123",
"https" : "https://1.1.1.1:456",
"ftp" : "ftp://1.1.1.1:789"
}
r = requests.get(url, proxies=proxyDict)
Of course you should replace the fake addresses above with real proxies that are good for what you want.
By the way, i'm sure there are off the shelf solutions for that, so maybe you should seek them out first instead of "reinventing the wheel". For example: https://www.uptrends.com/tools/uptime
You can use web proxies that allow hotlinking or APIs, or you can use proxychains if you are on linux, or if you want to go for manual effort go for VPNs.
You need to use 3rd party service which manually checks the URL by country/region, e.g. asm.ca.com I guess there's no way you can do it for specific IP. So you should determine the country by IP first.
Related
I have a website developed in flask running on an apache2 server that responds on port 80 to two URLs
Url-1 http://www.example.com
Url-2 http://oer.example.com
I want to detect which of the two urls the user is coming in from and adjust what the server does and store the variable in a config variable
app.config['SITE'] = 'OER'
or
app.config['SITE'] = 'WWW'
Looking around on the internet I can find lots of examples using urllib2 the issue is that you need to pass it the url you want to slice and I cant find a way to pull that out as it may change between the two with each request.
I could fork the code and put up two different versions but that's as ugly as a box of frogs.
Thoughts welcome.
Use the Flask request object (from flask import request) and one of the following in your request handler:
hostname = request.environ.get('HTTP_HOST', '')
url = urlparse(request.url)
hostname = url.netloc
This will get e.g. oer.example.com or www.example.com. If there is a port number that will be included too. Keep in mind that this ultimately comes from the client request so "bad" requests might have it set wrong, although hopefully apache wouldn't route those to your app.
I just deployed a Flask app on Webfaction and I've noticed that request.remote_addr is always 127.0.0.1. which is of course isn't of much use.
How can I get the real IP address of the user in Flask on Webfaction?
Thanks!
If there is a proxy in front of Flask, then something like this will get the real IP in Flask:
if request.headers.getlist("X-Forwarded-For"):
ip = request.headers.getlist("X-Forwarded-For")[0]
else:
ip = request.remote_addr
Update: Very good point mentioned by Eli in his comment. There could be some security issues if you just simply use this. Read Eli's post to get more details.
Werkzeug middleware
Flask's documentation is pretty specific about recommended reverse proxy server setup:
If you deploy your application using one of these [WSGI] servers behind an HTTP [reverse] proxy you will need to rewrite a few headers in order for the application to work [properly]. The two problematic values in the WSGI environment usually are REMOTE_ADDR and HTTP_HOST... Werkzeug ships a fixer that will solve some common setups, but you might want to write your own WSGI middleware for specific setups.
And also about security consideration:
Please keep in mind that it is a security issue to use such a middleware in a non-proxy setup because it will blindly trust the incoming headers which might be forged by malicious clients.
The suggested code (that installs the middleware) that will make request.remote_addr return client IP address is:
from werkzeug.contrib.fixers import ProxyFix
app.wsgi_app = ProxyFix(app.wsgi_app, num_proxies=1)
Note num_proxies which is 1 by default. It's the number of proxy servers in front of the app.
The actual code is as follows (lastest werkzeug==0.14.1 at the time of writing):
def get_remote_addr(self, forwarded_for):
if len(forwarded_for) >= self.num_proxies:
return forwarded_for[-self.num_proxies]
Webfaction
Webfaction's documentation about Accessing REMOTE_ADDR says:
...the IP address is available as the first IP address in the comma separated list in the HTTP_X_FORWARDED_FOR header.
They don't say what they do when a client request already contains X-Forwarded-For header, but following common sense I would assume they replace it. Thus for Webfaction num_proxies should be set to 0.
Nginx
Nginx is more explicit about it's $proxy_add_x_forwarded_for:
the “X-Forwarded-For” client request header field with the $remote_addr variable appended to it, separated by a comma. If the “X-Forwarded-For” field is not present in the client request header, the $proxy_add_x_forwarded_for variable is equal to the $remote_addr variable.
For Nginx in front of the app num_proxies should be left at default 1.
Rewriting the Ignas's answer:
headers_list = request.headers.getlist("X-Forwarded-For")
user_ip = headers_list[0] if headers_list else request.remote_addr
Remember to read Eli's post about spoofing considerations.
You can use request.access_route to access list of ip :
if len(request.access_route) > 1:
return request.access_route[-1]
else:
return request.access_route[0]
Update:
You can just write this:
return request.access_route[-1]
The problem is there's probably some kind of proxy in front of Flask. In this case the "real" IP address can often be found in request.headers['X-Forwarded-For'].
I am working on web-crawler [using python].
Situation is, for example, I am behind server-1 and I use proxy setting to connect to the Outside world. So in Python, using proxy-handler I can fetch the urls.
Now thing is, I am building a crawler so I cannot use only one IP [otherwise I will be blocked]. To solve this, I have bunch of Proxies, I want to shuffle through.
My question is: This is two level proxy, one to connect to main server-1, I use proxy and then after to shuffle through proxies, I want to use proxy. How can I achieve this?
Update Sounds like you're looking to connect to proxy A and from there initiate HTTP connections via proxies B, C, D which are outside of A. You might look into the proxychains project which says it can "tunnel any protocol via a user-defined chain of TOR, SOCKS 4/5, and HTTP proxies".
Version 3.1 is available as a package in Ubuntu Lucid. If it doesn't work directly for you, the proxychains source code may provide some insight into how this capability could be implemented for your app.
Orig answer:
Check out the urllib2.ProxyHandler. Here is an example of how you can use several different proxies to open urls:
import random
import urllib2
# put the urls for all of your proxies in a list
proxies = ['http://localhost:8080/']
# construct your list of url openers which each use a different proxy
openers = []
for proxy in proxies:
opener = urllib2.build_opener(urllib2.ProxyHandler({'http': proxy}))
openers.append(opener)
# select a url opener randomly, round-robin, or with some other scheme
opener = random.choice(openers)
req = urllib2.Request(url)
res = opener.open(req)
I recommend you take a look at CherryProxy. It lets you send a proxy request to an intermediate server (where CherryProxy is running) and then forward your HTTP request to a proxy on a second level machine (e.g. squid proxy on another server) for processing. Viola! A two-level proxy chain.
http://www.decalage.info/python/cherryproxy
I've got mechanize setup and working with python. I am adding support for using a proxy, but how do I check that I am actually using the proxy?
Here is some code I am using:
ip = 'some proxy ip address'
br.set_proxies({"http://": ip} )
I started to wonder if it was working because just to do some testing I typed in:
ip = 'asdfasdf'
and it didn't throw an error. So how do I go about checking if it is really using the ip address for the proxy that I pass in or the ip address of my computer? Is there a way to return info on your ip in mechanize?
maybe like this ?
br = mechanize.Browser()
br.set_proxies({"http": '127.0.0.1:80'})
you need to debug for more information
br.set_debug_http(True)
br.set_debug_redirects(True)
I am not sure how to handle this issue with mechanize, but you could read the next link that explains how to do it without mechanize (but still in python):
Proxy Check in python
The simple solution provided at the above-mentioned link could be easily adapted to your needs.
Thus, instead of the line:
print "Connection error! (Check proxy)"
you could replace by
SucceededYesNo="NO"
and instead of
print "All was fine"
just replace by
SucceededYesNo="YES"
Now, you have a variable available for further processing.
I am however afraid this will not cover the cases when the target web page is down because the same error might occur out of two causes (so one would not know whether a NO outcome is coming from a not working proxy server or from a bad web page), but still could be a solution: what about to check with the above-mentioned code a working web page? i.e. www.google.com? In this way, you could eliminate one cause and it remains the other.
I'm trying to use TOR as a generic proxy but it fails
Right now I'm trying with python but I'm pretty sure it would be the same with any other language. I can connect to other proxies with python so I get how it "should" be done.
I found a list of TOR entry nodes
h = httplib.HTTPConnection("one entry node", 80)
h.connect()
h.request("GET", "www.google.com")
resp = h.getresponse()
page = resp.read()
unfortunately that doesnt work, i get redirected to a 404 message.
I'm just not sure of what I'm doing wrong. Probably the list of entry nodes cannot be connected just like that. I'm searching on how to do it properly but i dont get any documentation about how to program applications with tor
edit :
ditch the tor proxy list, i don't know why i should want to know about it.
the "entry node" is yourself, after you've installed the (windows) vidalia client and privoxy (all bundled as one)
httplib.HTTPConnection("one entry node", 80)
becomes
httplib.HTTPConnection("127.0.0.1", 8118)
and voilà, everything is routed through TOR
First, make sure you are using the correct node location and port. Most proxies use ports other than 80. Second, specify the protocol to use with the correct URL on your request string.
Under normal circumstances, your code should work if it looks something like this one:
h = httplib.HTTPConnection("138.45.68.134", 8080)
h.connect()
h.request("GET", "http://www.google.com")
resp = h.getresponse()
page = resp.read()
h.close();
You can also use socket as an alternative but that's another issue and it's even more complicated than the one above.
Hope that helps! :-)