Stream a file to the HTTP response in Pylons - python

I have a Pylons controller action that needs to return a file to the client. (The file is outside the web root, so I can't just link directly to it.) The simplest way is, of course, this:
with open(filepath, 'rb') as f:
response.write(f.read())
That works, but it's obviously inefficient for large files. What's the best way to do this? I haven't been able to find any convenient methods in Pylons to stream the contents of the file. Do I really have to write the code to read a chunk at a time myself from scratch?

The correct tool to use is shutil.copyfileobj, which copies from one to the other a chunk at a time.
Example usage:
import shutil
with open(filepath, 'r') as f:
shutil.copyfileobj(f, response)
This will not result in very large memory usage, and does not require implementing the code yourself.
The usual care with exceptions should be taken - if you handle signals (such as SIGCHLD) you have to handle EINTR because the writes to response could be interrupted, and IOError/OSError can occur for various reasons when doing I/O.

I finally got it to work using the FileApp class, thanks to Chris AtLee and THC4k (from this answer). This method also allowed me to set the Content-Length header, something Pylons has a lot of trouble with, which enables the browser to show an estimate of the time remaining.
Here's the complete code:
def _send_file_response(self, filepath):
user_filename = '_'.join(filepath.split('/')[-2:])
file_size = os.path.getsize(filepath)
headers = [('Content-Disposition', 'attachment; filename=\"' + user_filename + '\"'),
('Content-Type', 'text/plain'),
('Content-Length', str(file_size))]
from paste.fileapp import FileApp
fapp = FileApp(filepath, headers=headers)
return fapp(request.environ, self.start_response)

The key here is that WSGI, and pylons by extension, work with iterable responses. So you should be able to write some code like (warning, untested code below!):
def file_streamer():
with open(filepath, 'rb') as f:
while True:
block = f.read(4096)
if not block:
break
yield block
response.app_iter = file_streamer()
Also, paste.fileapp.FileApp is designed to be able to return file data for you, so you can also try:
return FileApp(filepath)
in your controller method.

Related

Python doesn't release file after it is closed

What I need to do is to write some messages on a .txt file, close it and send it to a server. This happens in a infinite loop, so the code should look more or less like this:
from requests_toolbelt.multipart.encoder import MultipartEncoder
num = 0
while True:
num += 1
filename = f"example{num}.txt"
with open(filename, "w") as f:
f.write("Hello")
f.close()
mp_encoder = MultipartEncoder(
fields={
'file': ("file", open(filename, 'rb'), 'text/plain')
}
)
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
time.sleep(10)
The post works if the file is created manually inside my working directory, but if I try to create it and write on it through code, I receive this response message:
500 - Internal Server Error
System.IO.IOException: Unexpected end of Stream, the content may have already been read by another component.
I don't see the file appearing in the project window of PyCharm...I even used time.sleep(10) because at first, I thought it could be a time-related problem, but I didn't solve the problem. In fact, the file appears in my working directory only when I stop the code, so it seems the file is held by the program even after I explicitly called f.close(): I know the with function should take care of closing files, but it didn't look like that so I tried to add a close() to understand if that was the problem (spoiler: it was not)
I solved the problem by using another file
with open(filename, "r") as firstfile, open("new.txt", "a+") as secondfile:
secondfile.write(firstfile.read())
with open(filename, 'w'):
pass
r = requests.post("my_url/save_file", data=mp_encoder, headers=my_headers)
if r.status_code == requests.codes.ok:
os.remove("new.txt")
else:
print("File not saved")
I make a copy of the file, empty the original file to save space and send the copy to the server (and then delete the copy). Looks like the problem was that the original file was held open by the Python logging module
Firstly, can you change open(f, 'rb') to open("example.txt", 'rb'). In open, you should be passing file name not a closed file pointer.
Also, you can use os.path.abspath to show the location to know where file is written.
import os
os.path.abspath('.')
Third point, when you are using with context manager to open a file, you don't close the file. The context manger supposed to do it.
with open("example.txt", "w") as f:
f.write("Hello")

Immediate write JSON API response to file with Python Requests

I am trying to retrieve data from an API and immediate write the JSON response directly to a file and not store any part of the response in memory. The reason for this requirement is because I'm executing this script on a AWS Linux EC2 that only has 2GB of memory, and if I try to hold everything in memory and then write the responses to a file, the process will fail due to not enough memory.
I've tried using f.write() as well as sys.stdout.write(), but both of these approaches seemed to only write the file after all the queries were executed. While this worked with my small example, it didn't work when dealing with my actual data.
The problem with both approaches below is that the file doesn't populate until the loop is complete. This will not work with my actual process, as the machine doesn't have enough memory to hold the all the responses in memory.
How can I adapt either of the approaches below, or come up with something new, to write data received from the API immediately to a file without saving anything in memory?
Note: I'm using Python 3.7 but happy to update if there is something that would make this easier.
My Approach 1
# script1.py
import requests
import json
with open('data.json', 'w') as f:
for i in range(0, 100):
r = requests.get("https://httpbin.org/uuid")
data = r.json()
f.write(json.dumps(data) + "\n")
f.close()
My Approach 2
# script2.py
import request
import json
import sys
for i in range(0, 100):
r = requests.get("https://httpbin.org/uuid")
data = r.json()
sys.stdout.write(json.dumps(data))
sys.stdout.write("\n")
With approach 2, I tried using the > to redirect the output to a file:
script2.py > data.json
You can use response.iter_content to download the content in chunks. For example:
import requests
url = 'https://httpbin.org/uuid'
with requests.get(url, stream=True) as r:
r.raise_for_status()
with open('data.json', 'wb') as f_out:
for chunk in r.iter_content(chunk_size=8192):
f_out.write(chunk)
Saves data.json with content:
{
"uuid": "991a5843-35ca-47b3-81d3-258a6d4ce582"
}

Delete file when file download is complete on Python x Django [duplicate]

I'm using the following django/python code to stream a file to the browser:
wrapper = FileWrapper(file(path))
response = HttpResponse(wrapper, content_type='text/plain')
response['Content-Length'] = os.path.getsize(path)
return response
Is there a way to delete the file after the reponse is returned? Using a callback function or something?
I could just make a cron to delete all tmp files, but it would be neater if I could stream files and delete them as well from the same request.
You can use a NamedTemporaryFile:
from django.core.files.temp import NamedTemporaryFile
def send_file(request):
newfile = NamedTemporaryFile(suffix='.txt')
# save your data to newfile.name
wrapper = FileWrapper(newfile)
response = HttpResponse(wrapper, content_type=mime_type)
response['Content-Disposition'] = 'attachment; filename=%s' % os.path.basename(modelfile.name)
response['Content-Length'] = os.path.getsize(modelfile.name)
return response
temporary file should be deleted once the newfile object is evicted.
For future references:
I just had the case in which I couldn't use temp files for downloads.
But I still needed to delete them after it; so here is how I did it (I really didn't want to rely on cron jobs or celery or wossnames, its a very small system and I wanted it to stay that way).
def plug_cleaning_into_stream(stream, filename):
try:
closer = getattr(stream, 'close')
#define a new function that still uses the old one
def new_closer():
closer()
os.remove(filename)
#any cleaning you need added as well
#substitute it to the old close() function
setattr(stream, 'close', new_closer)
except:
raise
and then I just took the stream used for the response and plugged into it.
def send_file(request, filename):
with io.open(filename, 'rb') as ready_file:
plug_cleaning_into_stream(ready_file, filename)
response = HttpResponse(ready_file.read(), content_type='application/force-download')
# here all the rest of the heards settings
# ...
return response
I know this is quick and dirty but it works. I doubt it would be productive for a server with thousands of requests a second, but that's not my case here (max a few dozens a minute).
EDIT: Forgot to precise that I was dealing with very very big files that could not fit in memory during the download. So that is why I am using a BufferedReader (which is what is underneath io.open())
Mostly, we use periodic cron jobs for this.
Django already has one cron job to clean up lost sessions. And you're already running it, right?
See http://docs.djangoproject.com/en/dev/topics/http/sessions/#clearing-the-session-table
You want another command just like this one, in your application, that cleans up old files.
See this http://docs.djangoproject.com/en/dev/howto/custom-management-commands/
Also, you may not really be sending this file from Django. Sometimes you can get better performance by creating the file in a directory used by Apache and redirecting to a URL so the file can be served by Apache for you. Sometimes this is faster. It doesn't handle the cleanup any better, however.
One way would be to add a view to delete this file and call it from the client side using an asynchronous call (XMLHttpRequest). A variant of this would involve reporting back from the client on success so that the server can mark this file for deletion and have a periodic job clean it up.
This is just using the regular python approach (very simple example):
# something generates a file at filepath
from subprocess import Popen
# open file
with open(filepath, "rb") as fid:
filedata = fid.read()
# remove the file
p = Popen("rm %s" % filepath, shell=True)
# make response
response = HttpResponse(filedata, content-type="text/plain")
return response
Python 3.7 , Django 2.2.5
from tempfile import NamedTemporaryFile
from django.http import HttpResponse
with NamedTemporaryFile(suffix='.csv', mode='r+', encoding='utf8') as f:
f.write('\uFEFF') # BOM
f.write('sth you want')
# ref: https://docs.python.org/3/library/tempfile.html#examples
f.seek(0)
data=f.read()
response = HttpResponse(data, content_type="text/plain")
response['Content-Disposition'] = 'inline; filename=export.csv'

Sending back a file stream from GRPC Python Server

I have a service that needs to return a filestream to the calling client so I have created this proto file.
service Sample {
rpc getSomething(Request) returns (stream Response){}
}
message Request {
}
message Response {
bytes data = 1;
}
When the server receives this, it needs to read some source.txt file and then write it back to the client
as a byte stream. Just would like to ask is this the proper way to do this in a Python GRPC server?
fileName = "source.txt"
with open(file_name, 'r') as content_file:
content = content_file.read()
response.data = content.encode()
yield response
I cannot find any examples related to this.
That looks mostly correct, though it's hard to be sure since you haven't shared with us all of your service-side code. A few tweaks I'd suggest would be (1) reading the file as binary content in the first place, (2) exiting the with statement as early as possible, (3) constructing the response message only after you've constructed the value of its data field, and (4) making a module-scope module-private constant out of the file name. Something like:
with open(_CONTENT_FILE_NAME, 'rb') as content_file:
content = content_file.read()
yield my_generated_module_pb2.Response(data=content)
. What do you think?
One option would be to lazily read in the binary and yield each chunk. Note, this is untested code:
def read_bytes(file_, num_bytes):
while True:
bin = file_.read(num_bytes)
if len(bin) != num_bytes:
break
yield bin
class ResponseStreamer(Sample_pb2_grpc.SampleServicer):
def getSomething(request, context):
with open('test.bin', 'rb') as f:
for rec in read_bytes(f, 4):
yield Sample_pb2.Response(data=rec)
Downside is that you'll have the file opened while the stream is open.

How to print '<!DOCTYPE html>'?

My client requests a page from a server written in python 3.
The server return an html page that is presented by client.
Therefore, I did a dummy.html page and when client asks for it, my python reads it and returns it to the client:
filename = "dummy.html"
fh = open(filename, 'rt')
line = fh.readline()
while line:
print(line)
line = fh.readline()
fh.close()
However, this code does not read the <!DOCTYPE html> that is placed in the top of my dummy.html file (and thus, things like bootstrap don't work for me...).
I also tried printing it manually print('<!DOCTYPE html>') but that also does not work.
print('<!DOCTYPE html>') <---- IT IS PRINTED TO SDOUT BUT WHEN PRINTED TO CLIENT, THE PAGE DOES NOT HAVE THIS LINE ....
filename = CURRENTPATH+"\\..\\su.html"
fh = open(filename, 'rt')
line = fh.readline()
print('hello')
print('<'+'!'+'DOCTYPE html>')
while line:
print(line)
line = fh.readline()
fh.close()
How can I fix it?
It looks like you're trying to reimplement a web server in Python. Please consider using an existing web framework, such as Django (https://www.djangoproject.com/), Flask (http://flask.pocoo.org/) or Pyramid (http://www.pylonsproject.org/), which will do most of the work for you (including built-in support for a wide variety of HTML templating libraries, and actual performance).
As for your actual answer, a bare print statement prints to stdout, as expected. You need, instead, to write to the file-like object whose contents will be sent to the client (is it a socket? a file? who knows? stop reinventing the wheel).

Categories