When is the content of a GET request received in python? - python

I am fairly new to computer networking and want to use the python requests library for downloading large files from an external FTP server. I have a conceptual question as to when the content of a large file is received and how the client tells the server when to send over the content.
My code looks somewhat like
import requests
...
response = requests.get(url_to_very_large_file, stream=True)
...
with open(save_path, "wb") as file:
for chunk in response.iter_chunks(chunk_size):
file.write(chunk)
Now response arrives back from the server very quickly (less than a second), but the content of the file (say 2 GB heavy for the sake of argument) surely cannot arrive that fast. I'm also confused that response already has a content attribute. What happens under the hood?
More precisely:
What is in response.content?
Does the server now bombard my client with the 2 GB content right away, or is another request sent to the server when I ask for response.iter_chunks or response.content.read()? At which point does the server start sending over the 2GB of content?
Does the server know in which chunk_size I am reading /expecting the files?
Where are the chunks stored in the meantime, if they are received by the client but not read into memory?

response.content attribute contains the returned bytes from the remote server. This attribute is a property, so if you sent the request with stream=True option, it won't contain the content upon creation, until you access it- which is the moment where it'll pull all the data from the server.
When you send a request to a server, you're establishing a connection which the server will send data through. This doesn't have to happen at once, and if your underlying client is not pulling a data to its RAM, server will wait for you for a while. By using .iter_chunks method you're slowly pulling data from the server few bytes at a time.
They don't, and considering how TCP connection works it isn't necessary either.
Server doesn't send us a data until we got a room for it, hence they're not on our machine unless they're on our memory.
If you have already learnt other languages like Java, you could think of property as getter/setter but in more integrated way. Check the post I linked above for better explanations.
It might be helpful to learn how TCP connection and socket works, since those are the ones that does all the stuff under the hood.

Related

Connection reset when server doesn't fully consume request body

I'm now familiar with the general cause of this problem from another SO answer and from the uWSGI documentation, which states:
If an HTTP request has a body (like a POST request generated by a
form), you have to read (consume) it in your application. If you do
not do this, the communication socket with your webserver may be
clobbered.
However, I don't understand what exactly is happening at the TCP level for this problem to occur. Not knowing the details of this process, I would assume the server can simply discard what remains in the stream, but that's obviously not the case.
If I consume only part of the request body in my application and ultimately return a 200 response, a web browser will report a connection reset error. Who reset the connection? The webserver or the client? It seems like all the data has been sent by the client already, but the application has just not exhausted the stream. Is there something that happens when the stream is exhausted in the application that triggers the webserver to indicate it has finished reading?
My application is Python/Flask, but I've seen questions about this from several languages and frameworks. For example, this fails if exhaust() is not called on the request stream:
#app.route('/upload', methods=['POST'])
def handle-upload():
file = request.stream
pandas.read_csv(file, nrows=100)
response = # Do stuff
file.exhaust()
return jsonify(response)
While there is some buffering throughout the chain, large file transfers are not going to complete until the receiver has consumed them. The buffers will fill up, and packets will be dropped until the buffers are drained. Eventually, the browser will give up trying to send the file and drop the connection.

AutobahnPython Server HTML5 Front end

I have this AutobahnPython server up and running fine.
https://github.com/tavendo/AutobahnPython/blob/master/examples/websocket/streaming/streaming_server.py
I want to attach a HTML5 Front end for capture of web cam video and audio.
How do I get the HTML5 Blob to send through the socket I just created in HTML5 to the python socket server I also have running?
Is it sendMessage?
https://autobahnpython.readthedocs.org/en/latest/websocketbase.html#autobahn.websocket.WebSocketProtocol.sendMessage
Be prepared, doing what you want, and doing it right (which means flow-control), is an advanced topic. I try to give you a couple of hints. You might be also interested in reading this.
WebSocketProtocol.sendMessage is part of the AutobahnPython API. To be precise, it is part of the message-based basic API. Whereas the streaming server above uses the advanced API for receiving, it uses the basic API for sending (since the sent data is small, and there is no need for flow control)
Now, in your case, the web cam is the "mass data" producer. You will want to flow-control the sending from the JS to the server. Since if you just send out WebSocket messages from JS as fast as you get data from cam, your upstream connection might not keep up, and the browser's memory will just run away. Read about bufferedAmount which is part of the JS WebSocket API.
If you just want to consume data is it flow into your server, above AutobahnPython streaming server example is a good starting point since: you can process WebSocket data as it comes in. Other WebSocket frameworks will first buffer up a complete message until they give the message to you.
If you want to redistribute the data received by your server again to other connected client, you will want flow-control on the server's outgoing leg also. And then you will need the advanced API for sending also. See the reference or the streaming (producer) client examples - you can adjust the code to run inside your server.
Now if above all does not make sense to you .. it's a non-trivial thing. Try reading the first link to the Autobahn forum, and more about flow-control. It is also non-trivial since the JS WebSocket API has only limited machinery for doing this kind of flow-control, without falling back to invent your own scheme at app level. Well. Anyway, hope that helps a little.

Persistent HTTP connections with httplib

I'm trying to write an application where I send an initial HTTP post message to server and leave the connection open. The application then sits around until the server sends data back. Once the server sends data back I want to read it and write it to a file (easy enough).
The part I'm having trouble with is actua
Basically I do this:
h=http.HTTPConnection(sever, port, timeout)
h.putrequest('POST', selector)
h.putheaders(...)
h.endheaders()
h.send(body)
buffering = False
while 1:
r = h.getresponse(buffering)
f=open(unique_filename, 'w')
f.write(r.read())
f.close()
What I expect is that the app should block in the loop and when data arrives it gets written to the file. I suspect I'm using read the wrong way, but looking at the httplib source didn't help.
Also, the python documentation site mentions a httplib.fileno() that returns the socket httplib uses. I'm using 2.7.0 and website doc is for 2.7.2, I can't find the fileno() method. I suspect taking the socket over httplib and calling recv myself is the best way to go, is that a good idea?
Any help is appreciated with one exception: please don't tell me to use some other library.

How can i ignore server response to save bandwidth?

I am using a server to send some piece of information to another server every second. The problem is that the other server response is few kilobytes and this consumes the bandwidth on the first server ( about 2 GB in an hour ). I would like to send the request and ignore the return ( not even receive it to save bandwidth ) ..
I use a small python script for this task using (urllib). I don't mind using any other tool or even any other language if this is going to make the request only.
A 5K reply is small stuff and is probably below the standard TCP window size of your OS. This means that even if you close your network connection just after sending the request and checking just the very first bytes of the reply (to be sure that request has been really received) probably the server already sent you the whole answer and the packets are already on the wire or on your computer.
If you cannot control (i.e. trim down) what is the server reply for your notification the only alternative I can think to is to add another server on the remote machine waiting for a simple command and doing the real request locally and just sending back to you the result code. This can be done very easily may be even just with bash/perl/python using for example netcat/wget locally.
By the way there is something strange in your math as Glenn Maynard correctly wrote in a comment.
For HTTP, you can send a HEAD request instead of GET or POST:
import urllib2
request = urllib2.Request('https://stackoverflow.com/q/5049244/')
request.get_method = lambda: 'HEAD' # override get_method
response = urllib2.urlopen(request) # make request
print response.code, response.url
Output
200 https://stackoverflow.com/questions/5049244/how-can-i-ignore-server-response-t
o-save-bandwidth
See How do you send a HEAD HTTP request in Python?
Sorry but this does not make much sense and is likely a violation of the HTTP protocol. I consider such an idea as weird and broken-by-design. Either make the remote server shut up or configure your application or whatever is running on the remote server on a different protocol level using a smarter protocol with less bandwidth usage. Everything else is hard being considered as nonsense.

When does urllib2 actually download a file from a url?

url = "http://example.com/file.xml"
data = urllib2.urlopen(url)
data.read()
The question is, when exactly will the file be downloaded from the internet? When i do urlopen or .read()? On my network interface I see high traffic both times.
Witout looking at the code, I'd expect that the following happens:
urlopen() opens the connection, and sends the query. Then the server starts feeding the reply. At this point, the data accumulates in buffers until they are full and the operating system tells the server to hold on for a while.
Then data.read() empties the buffer, so the operating system tells the server to go on, and the rest of the reply gets downloaded.
Naturally, if the reply is short enough, or if the .read() happens quickly enough, then the buffers do not have time to fill up and the download happens in one go.
I agree with ddaa. However, if you want to understand this sort of thing, you can set up a dummy server using something like nc (in *nix) and then open the URL in the interactive Python interpreter.
In one terminal, run nc -l 1234 which will open a socket and listen for connections on port 1234 of the local machine. nc will accept an incoming connection and display whatever it reads from the socket. Anything you type into nc will be sent over the socket to the remote connection, in this case Python's urlopen().
Run Python in another terminal and enter your code, i.e.
data = urllib2.urlopen('http://127.0.0.1:1234')
data.read()
The call to urlopen() will establish the connection to the server, send the request and then block waiting for a response. You will see that nc prints the HTTP request into it's terminal.
Now type something into the terminal that is running nc. The call to urlopen() will still block until you press ENTER in nc, that is, until it receives a new line character. So urlopen() will not return until it has read at least one new line character. (For those concerned about possible buffering by nc, this is not an issue. urlopen() will block until it sees the first new line character.)
So it should be noted that urlopen() will block until the first new line character is received, after which data can be read from the connection. In practice, HTTP responses are short multiline responses, so urlopen() should return quite quickly.

Categories