Get requests from website - python

I'm trying to intercept all the requests received by a website, to get a certain file. For example, when you use Firefox's network monitor. Can I do that in Python? Sorry for being so vague. I'd like to get all the URLs that the website requests, like you can see in the picture. Example: the favicon, js files, xml files, etc.
Example:

So you probably need a packet sniffer like tcpdump. The best python sniffer I know is scapy. Here is in example of how HTTP may be sniffed with it:
http://www.r00tsec.com/2013/12/simple-sniffer-http-request-and-http.html
Note that you couldn't do that trick with HTTPS. Also packet sniffing usually requires root privileges on a host system.

Related

Python Requests - Get Server IP

I'm making a small tool that tests CDN performance and would like to check where the response comes from. I thought of getting the host's IP and then using one of the geolocation API's on github to check the country.
I've tried doing so with
import socket
...
raw._fp.fp._sock.getpeername()
...however that only works when i use stream=True for the request and that in turn breaks the tool's functionality.
Is there any other option to get the server ip with requests or in a completely different way?
The socket.gethostbyname() function from Python's socket library should solve your problem. You can check it out in the Python docs here.
Here is an example of how to use it:
import socket
url="cdnjs.cloudflare.com"
print("IP:",socket.gethostbyname(url))
All you need to do is pass the url to socket.gethostbyname() and it will do the rest. Just make sure to remove the http:// before the URL because that will trip it up.
I could not get Akilan's solution to give the IP address of a different host that I was using. socket.gethostbyname() and getpeername() were not working for me. They are not even available. His solution did open the door.
However, navigating the socket object, I did find this:
socket.getaddrinfo('host name', 443)[0][4][0]
I wrapped this in a try/except block.
Maybe there is a prettier way.

Is it possible to recreate a request from the packets programatically?

For a script I am making, I need to be able to see the parameters that are sent with a request.
This is possible through Fiddler, but I am trying to automate the process.
Here are some screenshots to start with. As you can see in the first picture of Fiddler, I can see the URL of a request and the parameters sent with that request.
I tried to do some packet sniffing with scapy with the code below to see if I can get a similar result, but what I get is in the second picture. Basically, I can get the source and destination of a packet as ip addresses, but the packets themselves are just bytes.
def sniffer():
t = AsyncSniffer(prn = lambda x: x.summary(), count = 10)
t.start()
time.sleep(8)
results = t.results
print(len(results))
print(results)
print(results[0])
From my understanding, after we establish a TCP connection, the request is broken down into several IP packets and then sent over to the destination. I would like to be able to replicate the functionality of Fiddler, where I can see the url of the request and then the values of parameters being sent over.
Would it be feasible to recreate the information of a request through only the information gathered from the packets?
Or is this difference because the sniffing is done on Layer 2, and then maybe Fiddler operates on Layer 3/4 before/after the translation into IP packets is done, so it actually sees the content of the original request itself and the result of the combination of packets? If my understanding is wrong, please correct me.
Basically, my question boils down to: "Is there a python module I can use to replicate the features of Fiddler to identify the destination url of a request and the parameters sent along with that request?"
The sniffed traffic is HTTPS traffic - therefore just by sniffing you won't see any details on the HTTP request/response because it is encrypted via SSL/TLS.
Fiddler is a proxy with HTTPS interception, that is something totally different compared to sniffing traffic on network level. This means that for the client application Fiddler "mimics" the server and for the server Fiddler mimics the client. This allows Fiddler to decrypt the requests/responses and show them to you.
If you want to perform request interception on python level I would recommend to you to use mitmproxy instead of Fiddler. This proxy also can perform HTTPS interception but it is written in Python and therefore much easier to integrate in your Python environment.
Alternatively if you just want to see the request/response details of a Python program it may be easier to do so by setting the log-level in an appropriate way. See for example this question: Log all requests from the python-requests module

sniffing http packets using scapy

Im using python 2.7.15, scapy and scapy-http on windows.
I want to sniff all the http packets and extract the html pages that were sent.
This is the code Im using:
from scapy.all import *
import scapy_http.http
def printPacket(packet):
if packet.haslayer('HTTP'):
print '='*50
print packet.show()
sniff(prn=printPacket)
but from some reason it only captures some of the http packets(when I use the browser I dont see any packets) and I dont see any html code in the ones that it does print.
I think that's because some of the traffic sent is HTTPS (= HTTP + TLS). In your function you expect to HTTP application layer, which is encapsulated and encrypted in a TLS layer, and therefore it is not matched.
To sniff HTTPS, you can use this: https://github.com/tintinweb/scapy-ssl_tls (I haven't tried it yet).

Python - Socket error

My code :-
s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
s.connect(("www.python.org" , 80))
s.sendall(b"GET https://www.python.org HTTP/1.0\n\n")
print(s.recv(4096))
s.close()
Why the output shows me this:-
b'HTTP/1.1 500 Domain Not Found\r\nServer: Varnish\r\nRetry-After: 0\r\ncontent-type: text/html\r\nCache-Control: private, no-cache\r\nconnection: keep-alive\r\nContent-Length: 179\r\nAccept-Ranges: bytes\r\nDate: Tue, 11 Jul 2017 15:23:55 GMT\r\nVia: 1.1 varnish\r\nConnection: close\r\n\r\n\n\n\nFastly error: unknown domain \n\n\nFastly error: unknown domain: . Please check that this domain has been added to a service.'
How can I fix it?
This is wrong on multiple levels:
to access a HTTPS resource you need to create a TLS connection (i.e. ssl_wrap on top of an existing TCP connection, with proper certificate checking etc) and then send the HTTP request. Of course the TCP connection in this case should go to port 443(https) not 80 (http).
the HTTP request should only contain the path, not the full URL
the line end must be \r\n not \n
you better send a Host header too since many severs require it
And that's only the request. Properly handling the response is a different topic.
I really really recommend to use an existing library like requests. HTTP(S) is considerably more complex as most think who only had a look at a few traffic captures.
import requests
x = requests.get('https://www.python.org')
print x.text
With the requests library, HTTPS requests are very simple! If you're doing this with raw sockets, you have to do a lot more work to negotiate a cipher and etc. Try the above code (python 2.7).
I would also note that, in my experience, Python is excellent for doing things quickly. If you are learning about networking and cryptography, try writing a HTTPS client on your own using sockets. If you want to automate something quickly, use the tools that are available to you. I almost always use requests for this type of task. As an additional note, if you're interested in parsing HTML content, check out the PyQuery library. I've used it to automate interaction with many web services.
Requests
PyQuery

Change/Add payload existing TCP packages?

I want to change the payload of all existing outgoing packets, so all packets that have "wordA" in it will be changed to "wordB", this will be done by a regex match.
I tried Python's scapy, but I don't know how to get it working.
PS: There won't be any wifi involved in here. Options that require port forwarding are depreciated.
import re
text = re.sub("\bwordA\b", "wordB", text)

Categories