Python - route selenium browser through a specific IP address

Python - route selenium browser through a specific IP address - python

I have multiple network interfaces (tun0, tun1 ...) and want to open several firefox browser instances in python, such that each one goes through a specific interface.
I can obtain the ip address of each interface with netifaces but have not found any way to 'attach' them to browser = webdriver.Firefox(...). There is plenty of documentation on using webdriver.DesiredCapabilities and proxies but that isnt't what I'd like to achieve.
Ideally I'd really like to make it work at the python rather than OS level, since the interfaces/ip addresses will change and this is driven by the python code.
Using FreeBSD 11.1 and Python 3.6.

I am not sure if it works but you can download selenium standalone server and run it with other network interface like in this answer and by assigning different ports (you can do it in command line while starting the server java -jar selenium-server-standalone-version.jar -port 4545) you can connect them individually. I don't know if network interface method works on browsers because driver launches a new process but I think it's worth to try, maybe it can help you to think the different ways.

Related

How do I intercept web browser traffic in python?

I want to intercept all destinations, so I can reroute them, kind of like a virtual lan. How would I intercept and find the hostname of a destination packet?
I've searched the web but I haven't found anything. I would like it to be like a device driver, it starts and waits for web browsers to request a specific IP or domain name, and reroute it to a different IP or domain name.

You do that using a (local) "proxy" process. There are several solutions to set up such a "web proxy". You can even write one using a few lines of python capturing HTTP-traffic.
However, since most HTTPS web traffic is nowadays protected by SSL/TLS, you probably can't inspect the plain text details of the internet traffic without resorting to specific techniques.

Acess a local web server from my phone using python

I'm developing for web and I have a prototype of my website. For testing locally I usually do
python3 -m http.server
and then go to localhost:8000 on my browser. This works great!
Then I read on the internet I could figure my private IPv4 IP address (10.0.0.101 as per ipconfig) and run
10.0.0.101:8000
on my phone's browser to access my website.
Thing is when I do this I get an "This site can't be reached (ERR_CONNECTION_TIMED_OUT)".
My network config
Between my ISP modem and my devices I have an router (D-Link DIR-815). An ethernet cable connects the router to my desktop computer and my phone is connected via Wi-Fi.
What I've tried
I've tried all items below and a lot of combinations between them (but unlikely all of the combinations because there are too many)
Using chrome inside the Bluestacks emulator
Forwarding the port 8000 on my router and using <my_public_ip>:8000
adding a firewall exception for the 8000 port (both TCP and UDP)
using 10.0.0.101:8000 on my desktop browser
I did it just for testing and it came as a surprise for me that this
didn't work!
using 127.0.0.1:8000 on my desktop browser (just for testing)
That didn't work. Which came as an even bigger surprise for me.
The Question
What am I missing here? Why can't I access my localhost from my phone?
I've read many question including this one which contains many answer and was the top result on google. Bug the thing is both for the sake of simplicity of using only python and for my education I want to know how can I do this without installing more complex solutions like XAMPP.
PS:
Also I know it is possible because it is shown in this video.
I double checked my private IPv4 IP in my router (dlinkrouter.local)

switch to another IP on server when python requests goes out

i have a VPS which has 3 different IP addresses.
and i have a python script on it which crawl an specific website every 1 hour
for example this is my python request:
my_request = requests.get('https://example.com/timeline.json')
the only thing that i want is every time my python traffic goes out from the server it uses that IP addresses randomly

So after long deliberation in the comments, I can point you to some resources that hopefully will lead you to an answer. It's difficult to give a definitive answer to this question, because I'm unaware what kind of virtual infrastructure you are using or what the network looks like.
Here is a thread which had a similar desire, but was ultimately left unanswered. However, it is clear that if you are using AWS you will need to use VPC. And you would have to configure your ENI's in a special way to intermittently use your public IP addresses, to learn more about ENI's in VPC see here
This article details exactly what you are trying to accomplish, but using a Vyatta router with a special NAT configuration. There is a also mention of being able to accomplish this using advanced iptables rules, which might be worth looking into.
Regardless, you cannot accomplish delivering traffic using various public IP addresses from inside the script that is doing the crawling. To have this effect you will need to somehow modify the host network configuration in a special way (advanced iptable rules or periodically changing default routes), change the configuration of your virtual router (special NAT/routing rules), or using methods specific to your virtual hosting platform (Amazon VPC).

how to use proxies without the remote site being able to detect the host/host IP?

I'm attempting to use a proxy, via python, in order to log into a site from a different, specific IP address. It seems that certain websites, however, can detect the original (host) IP address. I've investigated the issue a bit and here's what I found.
There are four proxy methods I've tried:
Firefox with a proxy setting.
Python with mechanize.set_proxies.
Firefox in a virtual machine using an internal network, along with another virtual machine acting as a router (having two adapters: a NAT, and that internal network), set up such that the internal network traffic is routed through a proxy.
TorBrowser (which uses Firefox as the actual browser).
For the first three I used the same proxy. The Tor option was just for additional testing, not via my own proxy. The following things are behaviors I've noticed that are expected:
With all of these, if I go to http://www.whatismyip.com/, it gives the correct IP address (the IP address of the proxy, not the host computer).
whatismyip.com says "No Proxy Detected" for all of these.
Indeed, it seems like the websites I visit do think my IP is that of the proxy. However, there have been a few weird cases which makes me think that some sites can somehow detect my original IP address.
In one situation, visiting a non-US site via Firefox with a non-US proxy, the site literally was able to print my originating IP address (from the US) and deny me access. Shouldn't this be impossible? Visiting the site via the virtual machine with that same non-US proxy, or the TorBrowser with a non-US exit node, though, the site was unable to do so.
In a similar situation, I was visiting another non-US site from a non-US proxy. If I logged into the site from Firefox within the virtual machine, or from the TorBrowser with a non-US exit node, the site would work properly. However, if I attempted to log in via Firefox with a proxy (the same proxy the virtual machine uses), or with mechanize, it would fail to log in with an unrelated error message.
In a third situation, using the mechanize.set_proxies option, I overloaded a site with too many requests so it decided to block access (it would purposefully time out whenever I logged in). I thought it might have blocked the proxy's IP address. However, when I ran the code from a different host machine, but with the same proxy, it worked again, for a short while, until they blocked it again. (No worries, I won't be harassing the site any further - I re-ran the program as I thought it might have been a glitch on my end, not a block from their end.) Visiting that site with the Firefox+proxy solution from one of the blocked hosts also resulted in the purposeful timeout.
It seems to me that all of these sites, in the Firefox + proxy and mechanize cases, were able to find out something about the host machine's IP address, whereas in the TorBrowser and virtual machine cases, they weren't.
How are the sites able to gather this information? What is different about the TorBrowser and virtual machine cases that prevents the sites from gathering this information? And, how would I implement my python script so that the sites I'm visiting via the proxy can't detect the host/host's IP address?

It's possible that the proxy is reporting your real IP address in the X-Forwarded-For HTTP header, although if so, I'm surprised that the WhatIsMyIP site didn't tell you about it.
If you first visited the non-US site directly, and then later again using the proxy, it's also possible that the site might have set cookies in your browser on your first visit that let the site identify you even after your IP address changes. This could account for the differences you've observed between browser instances.
(I've noticed that academic journal sites like to do that. If I try to access a paywalled article from home and get blocked because I wasn't using my university's proxy server, I'll typically have to clear cookies after enabling the proxy to be allowed access.)

urllib.urlopen to open page on same port just hangs

I am trying to use urllib.urlopen to open a web page running on the same host and port as the page I am loading it from and it is just hanging.
For example I have a page at: "http://mydevserver.com:8001/readpage.html" and I have the following code in it:
data = urllib.urlopen("http://mydevserver.com:8001/testpage.html")
When I try and load the page it just hangs. However if I move the testpage.html script to a different port on the same host it works fine. e.g.
data = urllib.urlopen("http://mydevserver.com:8002/testpage.html")
Does anyone know why this might be and how I can solve the problem?

A firewall perhaps? Try opening the page from the command line with wget/curl (assuming you're on Linux) or on the browser, with both ports on settings. Furthermore, you could try a packet sniffer to find out what's going on and where the connection gets stuck. Also, if testpage.html is dynamically generated, see if it is hit, check webserver logs if the request shows up there.

Maybe something is already running on port 8001. Does the page open properly with a browser?

You seem to be implying that you are accessing a web page that is scripted in Python. That implies that the Python script is handling the incoming connections, which could mean that since it's already handling the urllib call, it is not available to handle the connection that results from it as well.
Show the code (or tell us what software) you're using to serve these Python scripts.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.