Wow, it was hard to encapsulate my issue here into a succinct headline. I hope I managed.
I've got a simple thumbnail feature that is causing me issues when I try to retrieve a URL from Amazon S3, then convert it using ImageMagick. I would normally use PIL to read in an image file and convert it, but PIL doesn't read in PDF formats, so I'm resorting to convert through a subprocess call.
Here's some code from a django views.py. The idea here is that I get a file url from S3, open it with convert, process it into a PNG, send it to stdout, and then use the outputted buffer to load up a StringIO object, which then gets passed back to default_storages to save the thumbnail file back to S3. Quite a faff for such a simple job, but there you go.
Please note: I cannot reliably save a file to disk using convert on my production set-up with Heroku, otherwise, I'd be doing that already.
def _large_thumbnail_s3(p):
# get the URL from S3, trimming off the expiry info etc. So far so good.
url = default_storage.url(p+'.pdf').split('?')
url = url[0]
# this opens the PDF file fine, and proceeds to convert and send
# the new PNG to the buffer via standard output.
from subprocess import call
call("convert -thumbnail '400x600>' -density 96 -quality 85 "
+url
+" png:-", shell=True)
from StringIO import StringIO
# here's where the problem starts. I'm clearly not reading
# in the stdout correctly, as I get a IOError: File not open for reading
# from this next line of code:
completeStdin = sys.stdout.read()
im = StringIO(completeStdin)
# now take the StringIO PNG object and save it back to S3 (this
# should not be an issue.
im = default_storage.open(p+'_lg.png', 'w+b')
im.close()
Can anyone tell me a) where I might be going wrong with regards sending the output back to the thumbnail function, and b) whether you can suggest any more robust alternatives to what seems a pretty hacky way of doing this!
TIA
You need to use subprocess.check_output, not subprocess.call:
from subprocess import check_output
from StringIO import StringIO
out, err = check_output("convert -thumbnail '400x600>' -density 96 -quality 85 "
+url
+" png:-", shell=True)
buffer = StringIO(out)
Related
I'm not sure that this is possible, but I'm trying to generate a number of thumbnails from pdfs in an automated way and then store them within elasticsearch. Basically I would like to convert the pdf to a series of jpgs (or pngs, or anything similar) and then index them as binaries. Currently I'm producing these jpgs like this:
import subprocess
params = ['convert', 'pdf_file', 'thumb.jpg']
subprocess.check_call(params)
which works well, but it just writes the jpgs out to the filesystem. I would like to have these files as strings without writing them out to the local file system at all. I've tried using the stdout methods of subprocess, but I'm fairly new to using subprocesses, so I wasn't able to figure this one out.
I'm using imagemagick for this conversion, but I am open to switching to any other tool so long as I can achieve this goal.
Any ideas?
You can have it send the data to stdout instead...
import subprocess
params = ['convert', 'pdf_file', 'jpg:-']
image_data = subprocess.check_output(params)
you can use imagemagick's python API, for example something like:
import PythonMagick
img = PythonMagick.Image("file.pdf")
img.depth = 8
img.magick = "RGB"
data = img.data
or use wand:
from wand.image import Image
with Image(filename='file.pdf') as img:
data = img.make_blob('png')
I would like to have these files as strings without writing them out to the local file system at all.
The way to do this is to tell the command to write its data to stdout instead of a file, then just read it from proc.stdout.
Not every command has a way to tell it to do this, but in many cases, just passing - as the output filename will do it, and that's true for ImageMagick's convert. Of course you'll also need to give it a format, because it can no longer guess it from the extension of thumb.jpg. The easiest way to do this is in convert is to prefix the type to the - pseudo-filename. (Don't try that with anything other than ImageMagick.)
So:
import subprocess
params = ['convert', 'pdf_file', 'jpg:-']
converted = subprocess.check_output(params)
However, this is going to get you one giant string. If you were trying to get a bunch of separate images, you'll need to split the one giant string into separate images, which will presumably require some knowledge of the JPEG/JFIF format.
i've a issue with Python.
My case: i've a gzipped file from a partner platform (i.e. h..p//....namesite.../xxx)
If i click the link from my browser, it will download a file like (i.e. namefile.xml.gz).
So... if i read this file with python i can decompress and read it.
Code:
content = gzip.open(namefile.xml.gz,'rb')
print content.read()
But i can't if i try to read the file from remote source.
From remote file i can read only the encoded string, but not decoded it.
Code:
response = urllib2.urlopen(url)
encoded =response.read()
print encoded
With this code i can read the string encoded... but i can't decoded it with gzip or lzip.
Any advices?
Thanks a lot
Unfortunately the method #Aya suggests does not work, since GzipFile extensively uses seek method of the file object (not supported by response).
So you have basically two options:
Read the contents of the remote file into io.StringIO, and pass the object into gzip.GzipFile (if the file is small)
download the file into a temporary file on disk, and use gzip.open
There is another option (which requires some coding) - to implement your own reader using zlib module. It is rather easy, but you will need to know about a magic constant (How can I decompress a gzip stream with zlib?).
If you use Python 3.2 or later the bug in GzipFile (requiring tell support) is fixed, but they apparently aren't going to backport the fix to Python 2.x
For Python v3.2 or later, you can use the gzip.GzipFile class to wrap the file object returned by urllib2.urlopen(), with something like this...
import urllib2
import gzip
response = urllib2.urlopen(url)
gunzip_response = gzip.GzipFile(fileobj=response)
content = gunzip_response.read()
print content
...which will transparently decompress the response stream as you read it.
I have a program that generates pictures and either saves them to a file or prints out the raw image data in standard output. I am using Python subprocess module to call the external program, catch its stdout data and create a Python image object from the data. I keep getting "Cannot identify image file" error, though. I am new to this part of Python. Can you please help if you know how to achieve this? Here is my code:
p = subprocess.Popen(cmd, stdout=subprocess.PIPE)
raw = p.stdout.read()
buff = StringIO.StringIO()
buff.write(raw)
buff.seek(0)
im = Image.open(buff)
im.show()
The code looks fine to me. Try adding the lines...
assert len(raw) >= 4
assert raw.startswith('\x89PNG')
...directly after the line...
raw = p.stdout.read()
...just to ensure you're getting even vaguely valid data back.
Update
Try this...
from subprocess import check_output
from cStringIO import StringIO
from PIL import Image
raw = check_output(cmd)
buff = StringIO(raw)
im = Image.open(buff)
im.show()
Update #2
Problem is there's a bug in qrcode.exe whereby, when it's writing to stdout, it tries to convert UNIX line endings to DOS line endings.
Looks like the bug may have gotten fixed in a later version. Try: https://code.google.com/p/qrencode-win32/downloads/detail?name=qrcodegui_setup-3.3.1b.exe&can=1&q=qrencode-win32
I am trying to read only one file from a tar.gz file. All operations over tarfile object works fine, but when I read from concrete member, always StreamError is raised, check this code:
import tarfile
fd = tarfile.open('file.tar.gz', 'r|gz')
for member in fd.getmembers():
if not member.isfile():
continue
cfile = fd.extractfile(member)
print cfile.read()
cfile.close()
fd.close()
cfile.read() always causes "tarfile.StreamError: seeking backwards is not allowed"
I need to read contents to mem, not dumping to file (extractall works fine)
Thank you!
The problem is this line:
fd = tarfile.open('file.tar.gz', 'r|gz')
You don't want 'r|gz', you want 'r:gz'.
If I run your code on a trivial tarball, I can even print out the member and see test/foo, and then I get the same error on read that you get.
If I fix it to use 'r:gz', it works.
From the docs:
mode has to be a string of the form 'filemode[:compression]'
...
For special purposes, there is a second format for mode: 'filemode|[compression]'. tarfile.open() will return a TarFile object that processes its data as a stream of blocks. No random seeking will be done on the file… Use this variant in combination with e.g. sys.stdin, a socket file object or a tape device. However, such a TarFile object is limited in that it does not allow to be accessed randomly, see Examples.
'r|gz' is meant for when you have a non-seekable stream, and it only provides a subset of the operations. Unfortunately, it doesn't seem to document exactly which operations are allowed—and the link to Examples doesn't help, because none of the examples use this feature. So, you have to either read the source, or figure it out through trial and error.
But, since you have a normal, seekable file, you don't have to worry about that; just use 'r:gz'.
In addition to the file mode, I attempted to seek on a network stream.
I had the same error when trying to requests.get the file, so I extracted all to a tmp directory:
# stream == requests.get
inputs = [tarfile.open(fileobj=LZMAFile(stream), mode='r|')]
t = "/tmp"
for tarfileobj in inputs:
tarfileobj.extractall(path=t, members=None)
for fn in os.listdir(t):
with open(os.path.join(t, fn)) as payload:
print(payload.read())
I came up with the following problem: CODE A works right now.. I am saving a png file called chart.png locally, and then I am loading it into the proprietary function (which I do not have access).
However, in CODE B, am trying to use cStringIO.StringIO() so that I do not have to write the file "chart.png" to the disk. But I cannot find a way to pass it to the pproprietaryfunction because it is expecting a real filename like "chart.png" (it looks like it even uses the split function to identify the extension).
CODE A (code running right now):
file = "chart.png"
pylab.savefig(file, format='png')
a = proprietaryfunction.add(file)
CODE B (what I am trying to do - and does not work):
file = cStringIO.StringIO()
pylab.savefig(file, format='png')
a = proprietaryfunction.add(file)
How can I make the use of cStringIO.StringIO() transparent to the proprietary function? Is there anyway that I can emulate a virtual file system in memory for this?
Probably not, but there's always tempfile if you need a "clean" workaround...