How do I intercept web browser traffic in python?

How do I intercept web browser traffic in python? - python

I want to intercept all destinations, so I can reroute them, kind of like a virtual lan. How would I intercept and find the hostname of a destination packet?
I've searched the web but I haven't found anything. I would like it to be like a device driver, it starts and waits for web browsers to request a specific IP or domain name, and reroute it to a different IP or domain name.

You do that using a (local) "proxy" process. There are several solutions to set up such a "web proxy". You can even write one using a few lines of python capturing HTTP-traffic.
However, since most HTTPS web traffic is nowadays protected by SSL/TLS, you probably can't inspect the plain text details of the internet traffic without resorting to specific techniques.

Related

Locally hosted Django project for long-term use in local network

I am currently implementing a Django web application, which will be used only locally but long-term. I already managed to start the Django server on my local machine using python manage 0.0.0.0:myport and I am able to connect from any mobile device using MyLocalIPv4:myport.
In best case I only want to start the Django server once, establish a connection between a mobile device and the web app and let the web app run for an undefined long time on that mobile device
Now my assumption is, that MyLocalIPv4 will be changing over time as it is a dynamic IP address, which will force the user (or even worse myself) to look up the new IP address and re-establish the connection.
My question are: Do you know any mechanisms on how I can avoid this type of behaviour using another (maybe static) referral to the web app ? What do you think about this web application in term of security issues ?

DNS is the way to go. What you want is a (internal) domain that would map to your computer IP address.
There are many ways you can achieve that but I suggest going with whatever tools you have available. I assume that for your home network you're using some sort of a consumer-grade home router with wireless access point. Very often this type of hardware offers some way to "map" the hostname of a machine to its internal-network IP address.
For example, at home I'm using a RT-AC1200G+ router, which runs an internal DNS server and maps hostnames of clients of my network to their IP:
$ dig +short #192.168.1.2 samu-pc
192.168.1.70
$ ifconfig |grep 192.168.1.70
inet 192.168.1.70 netmask 255.255.255.0 broadcast 192.168.1.255
Alternatively, one of the easier solutions would be to ensure your IP does not change. You could assign a static IP to your django-server machine, OR if you want to continue using DHCP - use your routers functions to make a static assignment to a specific, static IP address using your network card's MAC address.
Disclaimer: There are other, more "professional" ways of solving service discovery within a network, but I would consider them overkill to your home network setup. Also, if you care about security, you should consider running the django app behind a reverse proxy with HTTPs on the front, just to ensure nobody in your internal network is trying to do something nasty.

switch to another IP on server when python requests goes out

i have a VPS which has 3 different IP addresses.
and i have a python script on it which crawl an specific website every 1 hour
for example this is my python request:
my_request = requests.get('https://example.com/timeline.json')
the only thing that i want is every time my python traffic goes out from the server it uses that IP addresses randomly

So after long deliberation in the comments, I can point you to some resources that hopefully will lead you to an answer. It's difficult to give a definitive answer to this question, because I'm unaware what kind of virtual infrastructure you are using or what the network looks like.
Here is a thread which had a similar desire, but was ultimately left unanswered. However, it is clear that if you are using AWS you will need to use VPC. And you would have to configure your ENI's in a special way to intermittently use your public IP addresses, to learn more about ENI's in VPC see here
This article details exactly what you are trying to accomplish, but using a Vyatta router with a special NAT configuration. There is a also mention of being able to accomplish this using advanced iptables rules, which might be worth looking into.
Regardless, you cannot accomplish delivering traffic using various public IP addresses from inside the script that is doing the crawling. To have this effect you will need to somehow modify the host network configuration in a special way (advanced iptable rules or periodically changing default routes), change the configuration of your virtual router (special NAT/routing rules), or using methods specific to your virtual hosting platform (Amazon VPC).

How to listen to/forward all ports on an interface, in Python or otherwise

I am writing an application, currently in Python + Twisted, which serves as a first port of call for DNS requests – if the requests meet certain patterns, e.g. Namecoin .bit addresses or OpenNIC TLDs, they are passed to different DNS resolvers, otherwise the default one is used.
Some addresses however I need redirected through special routes, e.g. Tor .onion addresses which don't resolve to traditional IPv4 addresses, or certain websites that require tunneling through a VPN for geolocation reasons. So when DNS requests for such sites come in, I want the application to create a new loopback interface/alias and return the IP of this interface. I then need to be able to tunnel all TCP and UDP traffic coming through this interface through the proxy/VPN or whatever to the endpoint it was set up for.
The question is, how can I do this? Listening on specific ports (e.g. 80) will be fine for most purposes, but as a perfectionist I would like to know how to accept connections/messages sent to ALL ports, without having to set up tens of thousands of listeners and potentially crashing the system.
Note: while everything is currently in Python, I don't mind adding components in C++ or another language, or playing around with network configurations to get this working.

How do I get the client port number in a Django project?

I am using django to build my web server, other people connect to me as clients. Now I need to know the clients' port number to distinguish them. If their browser opens two 'Tabs' of the same link, i.e. two pages but the same link, I also have to distinguish them.
Although I know I can use request.META['REMOTE_ADDR'] to get the client's IP in my django view function, but this realy is not enough for me.
Then I studied some TCP/IP basics and then I know that in TCP/IP layer, every IP packet has an IP header which contains the client's port number. But how can I access it in django?
Additional info:
I'm using python 2.6 and django 1.4
I know every TAB of a browser will be allocated a random unique port to access my django web page port. -- see this link 'The web server opens port 80, but the browser has a different, randomly-assigned port.' I really need to distinguish them. So my intuitive thoughts is to use the port number in the IP packet. If you have any other suggestion, also welcome.
I have found the similar question here, but I am not using Apache now. And this may be hard for me to config so maybe causing other more complex questions. This might make this simple question complex.

while I debug the django , I find this
request.environ["wsgi.input"].raw._sock.getpeername()
maybe it can work

Yes, after days of struggling, I answer it, with a working, but ugly solution on 'how to get client port in Django'.
in your python26/Lib/SocketServer.py, find def process_request_thread,add
global gClientPort; gClientPort = client_address
use this global value in yout project. Its format is ('12.34.56.78',55437) for example. 55437 is the port number.

Your assumption about 'every user connection opens connection at unique port' is wrong. All users are using the same port to connect.
To distinguish users Django (and almost every other frameworks) is using sessions. Every user gets a cookie with his unique session ID, and this cookie is passed to a server on every connection, so the application can distinguish users.
Here is documentation on sessions:
https://docs.djangoproject.com/en/1.8/topics/http/sessions/

how to use proxies without the remote site being able to detect the host/host IP?

I'm attempting to use a proxy, via python, in order to log into a site from a different, specific IP address. It seems that certain websites, however, can detect the original (host) IP address. I've investigated the issue a bit and here's what I found.
There are four proxy methods I've tried:
Firefox with a proxy setting.
Python with mechanize.set_proxies.
Firefox in a virtual machine using an internal network, along with another virtual machine acting as a router (having two adapters: a NAT, and that internal network), set up such that the internal network traffic is routed through a proxy.
TorBrowser (which uses Firefox as the actual browser).
For the first three I used the same proxy. The Tor option was just for additional testing, not via my own proxy. The following things are behaviors I've noticed that are expected:
With all of these, if I go to http://www.whatismyip.com/, it gives the correct IP address (the IP address of the proxy, not the host computer).
whatismyip.com says "No Proxy Detected" for all of these.
Indeed, it seems like the websites I visit do think my IP is that of the proxy. However, there have been a few weird cases which makes me think that some sites can somehow detect my original IP address.
In one situation, visiting a non-US site via Firefox with a non-US proxy, the site literally was able to print my originating IP address (from the US) and deny me access. Shouldn't this be impossible? Visiting the site via the virtual machine with that same non-US proxy, or the TorBrowser with a non-US exit node, though, the site was unable to do so.
In a similar situation, I was visiting another non-US site from a non-US proxy. If I logged into the site from Firefox within the virtual machine, or from the TorBrowser with a non-US exit node, the site would work properly. However, if I attempted to log in via Firefox with a proxy (the same proxy the virtual machine uses), or with mechanize, it would fail to log in with an unrelated error message.
In a third situation, using the mechanize.set_proxies option, I overloaded a site with too many requests so it decided to block access (it would purposefully time out whenever I logged in). I thought it might have blocked the proxy's IP address. However, when I ran the code from a different host machine, but with the same proxy, it worked again, for a short while, until they blocked it again. (No worries, I won't be harassing the site any further - I re-ran the program as I thought it might have been a glitch on my end, not a block from their end.) Visiting that site with the Firefox+proxy solution from one of the blocked hosts also resulted in the purposeful timeout.
It seems to me that all of these sites, in the Firefox + proxy and mechanize cases, were able to find out something about the host machine's IP address, whereas in the TorBrowser and virtual machine cases, they weren't.
How are the sites able to gather this information? What is different about the TorBrowser and virtual machine cases that prevents the sites from gathering this information? And, how would I implement my python script so that the sites I'm visiting via the proxy can't detect the host/host's IP address?

It's possible that the proxy is reporting your real IP address in the X-Forwarded-For HTTP header, although if so, I'm surprised that the WhatIsMyIP site didn't tell you about it.
If you first visited the non-US site directly, and then later again using the proxy, it's also possible that the site might have set cookies in your browser on your first visit that let the site identify you even after your IP address changes. This could account for the differences you've observed between browser instances.
(I've noticed that academic journal sites like to do that. If I try to access a paywalled article from home and get blocked because I wasn't using my university's proxy server, I'll typically have to clear cookies after enabling the proxy to be allowed access.)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.