When does urllib2 actually download a file from a url?

When does urllib2 actually download a file from a url? - python

url = "http://example.com/file.xml"
data = urllib2.urlopen(url)
data.read()
The question is, when exactly will the file be downloaded from the internet? When i do urlopen or .read()? On my network interface I see high traffic both times.

Witout looking at the code, I'd expect that the following happens:
urlopen() opens the connection, and sends the query. Then the server starts feeding the reply. At this point, the data accumulates in buffers until they are full and the operating system tells the server to hold on for a while.
Then data.read() empties the buffer, so the operating system tells the server to go on, and the rest of the reply gets downloaded.
Naturally, if the reply is short enough, or if the .read() happens quickly enough, then the buffers do not have time to fill up and the download happens in one go.

I agree with ddaa. However, if you want to understand this sort of thing, you can set up a dummy server using something like nc (in *nix) and then open the URL in the interactive Python interpreter.
In one terminal, run nc -l 1234 which will open a socket and listen for connections on port 1234 of the local machine. nc will accept an incoming connection and display whatever it reads from the socket. Anything you type into nc will be sent over the socket to the remote connection, in this case Python's urlopen().
Run Python in another terminal and enter your code, i.e.
data = urllib2.urlopen('http://127.0.0.1:1234')
data.read()
The call to urlopen() will establish the connection to the server, send the request and then block waiting for a response. You will see that nc prints the HTTP request into it's terminal.
Now type something into the terminal that is running nc. The call to urlopen() will still block until you press ENTER in nc, that is, until it receives a new line character. So urlopen() will not return until it has read at least one new line character. (For those concerned about possible buffering by nc, this is not an issue. urlopen() will block until it sees the first new line character.)
So it should be noted that urlopen() will block until the first new line character is received, after which data can be read from the connection. In practice, HTTP responses are short multiline responses, so urlopen() should return quite quickly.

Related

When is the content of a GET request received in python?

I am fairly new to computer networking and want to use the python requests library for downloading large files from an external FTP server. I have a conceptual question as to when the content of a large file is received and how the client tells the server when to send over the content.
My code looks somewhat like
import requests
...
response = requests.get(url_to_very_large_file, stream=True)
...
with open(save_path, "wb") as file:
for chunk in response.iter_chunks(chunk_size):
file.write(chunk)
Now response arrives back from the server very quickly (less than a second), but the content of the file (say 2 GB heavy for the sake of argument) surely cannot arrive that fast. I'm also confused that response already has a content attribute. What happens under the hood?
More precisely:
What is in response.content?
Does the server now bombard my client with the 2 GB content right away, or is another request sent to the server when I ask for response.iter_chunks or response.content.read()? At which point does the server start sending over the 2GB of content?
Does the server know in which chunk_size I am reading /expecting the files?
Where are the chunks stored in the meantime, if they are received by the client but not read into memory?

response.content attribute contains the returned bytes from the remote server. This attribute is a property, so if you sent the request with stream=True option, it won't contain the content upon creation, until you access it- which is the moment where it'll pull all the data from the server.
When you send a request to a server, you're establishing a connection which the server will send data through. This doesn't have to happen at once, and if your underlying client is not pulling a data to its RAM, server will wait for you for a while. By using .iter_chunks method you're slowly pulling data from the server few bytes at a time.
They don't, and considering how TCP connection works it isn't necessary either.
Server doesn't send us a data until we got a room for it, hence they're not on our machine unless they're on our memory.
If you have already learnt other languages like Java, you could think of property as getter/setter but in more integrated way. Check the post I linked above for better explanations.
It might be helpful to learn how TCP connection and socket works, since those are the ones that does all the stuff under the hood.

View real-time console on a web page with Flask python [duplicate]

This question already has answers here:
Display the contents of a log file as it is updated
(3 answers)
Closed 6 years ago.
I am trying to create a web application with Flask.
The problem is that it has been 2 weeks since I am stuck on a problem.
I would like to run a python command that launches a server, retrieve the standard output, and display in real time on the web-site.
I do not know at all how to do because if I use "render_template" I do not see how to update the web site-the values sent in the console.
I use python 2.7, thank you very much

It's gonna take a lot of work to get this done and probably more then you think but I'll still try to help you out.
To get any real time updates to the browser you're going to need something like a socket connection, something which allows you to send multiple messages at any time. Not just when the browser requests it.
So imagine this with a regular http connection you can only receive a message once and once you receive that message you cannot receive a message again. You can only call return once and not again.
Once you call return, you cannot call return again to send another message.
So basically with a regular http request you can only receive the log messages once and once any changes have been made to the log you cannot send those changes to the client again since the connection is end.
The connection is end the moment you call return.
There is a way to fix this by using a socket connection. A socket connection would allow you to open a connection with the user and server and they both can send messages at any time as long as the connection is open. The connection is only not open when you manually close it.
Check this answer for ways you could have real time updates with flask. If you want to do it with sockets (which is what I suggest you to use) then use the websocket interface instead.
There's options like socketio for python which allow you to write websocket applications in python.
Overall this is gonna be split into 5 parts:
Start a websocket server when the Flask application start
Create a javsacript file (one that the browser loads) that connects with the websocket server
Find the function that gets triggered whenever Flask logging occurs
Send a socket message with the log inside of it
Make the browser display the log whenever it receives a websocket message
Here's a sample application written in Flask and socketio which should give you a idea on how to use socketio.
There's a lot to it and there's part you might be new to like websockets but don't let that stop you from doing what you want to do.
I hope this help, if any part confuses you then feel free to respond.

The simple part : server side, you could redirect the stdout and stderr of the server to a file,
import sys
print("output will be redirected")
# stdout is saved
save_stdout = sys.stdout
fh = open("output.txt","w")
sys.stdout = fh
the server itself would then read that file within a subprocess.
f = subprocess.Popen(['tail','-F',"output.txt", '-n1'],\
stdout=subprocess.PIPE,stderr=subprocess.PIPE)
p = select.poll()
p.register(f.stdout)
and the following threaded :
while True :
if p.poll(1):
output+=f.stdout.readline()
You can also use the tailhead or tailer libraries instead of the system tail
Now, the problem is that the standard output is a kind of active pipe and output is going to grow forever, so you'll need to keep only a frame of that output buffer.
If you have only one user that can connect to that window, the problem would be different, as you could flush the output as soon as is it send to that only client. See the difference between a terminal window and multiplexed, remote terminal window ?
I don't know flask, but client side, you only need some javascript to poll the server every second with an ajax request asking for the complete log (or --case of unique client-- the buffer to be appended to the DOM). You could also use websockets, but it's not an absolute necessity.
A compromise between the two is possible (infinite log with real time append / multiplexed at different rate) and it requires to keep a separate output buffer for each client.

Connect to python server using sockets in python 2.7

I created a python server on port 8000 using python -m SimpleHTTPServer.
When I visit this url from my web browser it shows the below content
Now, I want to get the above content using python. So, for that what I did is
>>> import socket
>>> s = socket.socket(
... socket.AF_INET, socket.SOCK_STREAM)
>>> s.connect(("localhost", 8000))
>>> s.recv(1024)
But after s.recv(1024) nothing happens it just wait there and prints nothing.
So, my question is how to get above directory content output using python. Also can someone suggest me a tutorial on socket programming with python. I didn't liked the official tutorial that much.
I also observed a strange thing when I try to receive contents using python and nothing happens at that time I cannot access localhost:8000 from my web browser but as soon as I kill my python program I can access it.

Arguably the simplest way to get content over http in python is to use the urllib2 module. For example:
from urllib2 import urlopen
f = urlopen('http://localhost:8000')
for line in f:
print line
This will print out the file hosted by SimpleHTTPServer.

But after s.recv(1024) nothing happens it just wait there and prints nothing.
You simply open a socket and waiting for the data, but it's not how HTTP protocol works. You have to send a request first if you want to receive a response (basically, you have to tell the server which directory you want to list or which file to download). If you really want to, you can send the request using raw sockets to train your skills, but the proper library is highly recommended (see Matthew Adams' response and urllib2 example).
I also observed a strange thing when I try to receive contents using python and nothing happens at that time I cannot access localhost:8000 from my web browser but as soon as I kill my python program I can access it.
This is because SimpleHTTServer is single-threaded and doesn't support multiple connections simultaneously. If you would like to fix it, take a look at the answers here: BasicHTTPServer, SimpleHTTPServer and concurrency.

Persistent HTTP connections with httplib

I'm trying to write an application where I send an initial HTTP post message to server and leave the connection open. The application then sits around until the server sends data back. Once the server sends data back I want to read it and write it to a file (easy enough).
The part I'm having trouble with is actua
Basically I do this:
h=http.HTTPConnection(sever, port, timeout)
h.putrequest('POST', selector)
h.putheaders(...)
h.endheaders()
h.send(body)
buffering = False
while 1:
r = h.getresponse(buffering)
f=open(unique_filename, 'w')
f.write(r.read())
f.close()
What I expect is that the app should block in the loop and when data arrives it gets written to the file. I suspect I'm using read the wrong way, but looking at the httplib source didn't help.
Also, the python documentation site mentions a httplib.fileno() that returns the socket httplib uses. I'm using 2.7.0 and website doc is for 2.7.2, I can't find the fileno() method. I suspect taking the socket over httplib and calling recv myself is the best way to go, is that a good idea?
Any help is appreciated with one exception: please don't tell me to use some other library.

How can i ignore server response to save bandwidth?

I am using a server to send some piece of information to another server every second. The problem is that the other server response is few kilobytes and this consumes the bandwidth on the first server ( about 2 GB in an hour ). I would like to send the request and ignore the return ( not even receive it to save bandwidth ) ..
I use a small python script for this task using (urllib). I don't mind using any other tool or even any other language if this is going to make the request only.

A 5K reply is small stuff and is probably below the standard TCP window size of your OS. This means that even if you close your network connection just after sending the request and checking just the very first bytes of the reply (to be sure that request has been really received) probably the server already sent you the whole answer and the packets are already on the wire or on your computer.
If you cannot control (i.e. trim down) what is the server reply for your notification the only alternative I can think to is to add another server on the remote machine waiting for a simple command and doing the real request locally and just sending back to you the result code. This can be done very easily may be even just with bash/perl/python using for example netcat/wget locally.
By the way there is something strange in your math as Glenn Maynard correctly wrote in a comment.

For HTTP, you can send a HEAD request instead of GET or POST:
import urllib2
request = urllib2.Request('https://stackoverflow.com/q/5049244/')
request.get_method = lambda: 'HEAD' # override get_method
response = urllib2.urlopen(request) # make request
print response.code, response.url
Output
200 https://stackoverflow.com/questions/5049244/how-can-i-ignore-server-response-t
o-save-bandwidth
See How do you send a HEAD HTTP request in Python?

Sorry but this does not make much sense and is likely a violation of the HTTP protocol. I consider such an idea as weird and broken-by-design. Either make the remote server shut up or configure your application or whatever is running on the remote server on a different protocol level using a smarter protocol with less bandwidth usage. Everything else is hard being considered as nonsense.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

When does urllib2 actually download a file from a url? - python

url = "http://example.com/file.xml" data = urllib2.urlopen(url) data.read() The question is, when exactly will the file be downloaded from the internet? When i do urlopen or .read()? On my network interface I see high traffic both times.

Related

When is the content of a GET request received in python?

View real-time console on a web page with Flask python [duplicate]

Connect to python server using sockets in python 2.7

Persistent HTTP connections with httplib

How can i ignore server response to save bandwidth?

Categories

Resources