I store uploaded images in gridfs (mongodb). Therefore, the image data is never saved on the normal filesystem. This works by using the following code:
import pymongo
import gridfs
conn = pymongo.Connection()
db = conn.my_gridfs_db
fs = gridfs.GridFS(db)
...
with fs.new_file(
filename = 'my-filename-1.png',
) as fp:
fp.write(image_data_as_string)
I also want to store thumbnails of that image. I do not care which library to use, PIL, Pillow, sorl-thumbnail or whatever fits best will work for me.
I want to know if there is a way to generate thumbnails without temporarily saving the file in the filesystem. That would be much cleaner and less overhead. Is there an in-memory thumbnail generator?
Update
My solution to save the thumbnail:
from PIL import Image, ImageOps
content = cStringIO.StringIO()
content(icon)
image = Image.open(content)
temp_content = cStringIO.StringIO()
thumb = ImageOps.fit(image, (width, height), Image.ANTIALIAS)
thumb.save(temp_content, format='png')
temp_content.seek(0)
gridfs_image_data = temp_content.getvalue()
with fs.new_file(
content_type = mimetypes.guess_type(filename)[0],
filename = filename,
size = size,
width = width,
height = height,
) as fp:
fp.write(gridfs_image_data)
The file is then served via nginx-gridfs.
You can save it to a StringIO object instead of a file (use the cStringIO module, if possible):
from StringIO import StringIO
fake_file = StringIO()
thing.save(fake_file) # Acts like a file handle
contents = fake_file.getvalue()
fake_file.close()
Or if you like context managers:
import contextlib
from StringIO import StringIO
with contextlib.closing(StringIO()) as handle:
thing.save(handle)
contents = handle.getvalue()
Related
I am using pypng and pyqrcode for QR code image generation in django app.
import os
from pyqrcode import create
import png # pypng
import base64
def embed_QR(url_input, name):
embedded_qr = create(url_input)
embedded_qr.png(name, scale=7)
def getQrWithURL(url):
name = 'url.png'
embed_QR(url, name)
with open(name, "rb") as image_file:
image_data = base64.b64encode(image_file.read()).decode('utf-8')
return image_data
When I call getQrWithURL with a url, it produces a file url.png to my directory. Is there a way to only get the image data without producing a file output?
thanks for your help.
Use a BytesIO as a writable stream:
import io
# Make a writeable stream
buffer = io.BytesIO()
# Create QR and write to buffer
embedded_qr = create(url_input)
embedded_qr.png(buffer,scale=7)
# Extract buffer contents - this is what you would get by reading a PNG disk file but without creating it
PNG = buffer.getvalue()
Your variable PNG now contains this:
b'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\xe7\x00\x00\x00\xe7\x01\x00\x00\x00\x00\xcd\x8f|\x8d\x00\x00\x01CIDATx\x9c\xed\x971\x92\x830\x0cE\xc5PPr\x04\x8e\x92\xa3\xc1\xd1|\x14\x8e\x90\x92\x82\xb1V_2\t\xde\xb0;i?c\x15q\xecG\xf3\x91\xfc%D\xff\x89,\x8d6\xda\xe8\x97\xf4)\x88AW\xfb\xed\xb7Q\x93\xef\'fj\x7ft\x1f\x9e\xf2Xt\xb7\xe34\x97Cb\x9a\xa43\x8a\xe3XL\xfd-\xa8\xc8\x1c\x19\xbc\x11-\xbb\x1b\xd0\xa3&m\x11\xe8\xbd\xaeX"\x1a\xbe\x01\xa1Q\x9aW\xaeBE\x8fXe\xee\xf4\x1c\xc44\xcd\x19\n\x93dX\xfcc\xb1\xcd\xc8L\x91A\x08\xb5c\xd5\xcd\x1f\xea\xb7*\xbfl\xd4\xf4\x9ao\xb8\xdeB\xbb\xd3-c\xa4h\xbc\x9dn\xa3\xd5\xa4\t\x1dW\xdf\t5\x9dt\xb1b,N\x88\xc8}\xe5\x93t\xd4\x8aQ\xa1Pp\xbd\x06\xe43\x9f\x9d\x90\x90\xfa-\xdb\xa3\xe3b\xa2\xc0Rw+6\n\',U\x18S\xdfo\xdf\xa0\xa3\x18\xf70J\xac\xf2\xea\xc6\xf5\xdb\xa0\xa3%\xdc>"\x91R\xbd\r>z\xccHq\xcb|T\xba\x98\xfa\xa8\xe8G1J\xcfN\xfd[\xc3/[x\xbbQ\x04=\x8d\xfek\x16\xafn\xf1\xfc\x14m\x18BK\xef\x9a\xa8\xa9\xd7\xa4\'\xed\xfd\x902\xd3\xf0\r\x8c\x12z\xb4\xe1\x0fW\xa1\xa2\x7fF\xa3\x8d6\xfa\x15\xfd\x01\xb9MCH#\xc3\xa2\x96\x00\x00\x00\x00IEND\xaeB`\x82'
I would like to extract text from scanned PDFs.
My "test" code is as follows:
from pdf2image import convert_from_path
from pytesseract import image_to_string
from PIL import Image
converted_scan = convert_from_path('test.pdf', 500)
for i in converted_scan:
i.save('scan_image.png', 'png')
text = image_to_string(Image.open('scan_image.png'))
with open('scan_text_output.txt', 'w') as outfile:
outfile.write(text.replace('\n\n', '\n'))
I would like to know if there is a way to extract the content of the image directly from the object converted_scan, without saving the scan as a new "physical" image file on the disk?
Basically, I would like to skip this part:
for i in converted_scan:
i.save('scan_image.png', 'png')
I have a few thousands scans to extract text from. Although all the generated new image files are not particularly heavy, it's not negligible and I find it a bit overkill.
EDIT
Here's a slightly different, more compact approach than Colonder's answer, based on this post. For .pdf files with many pages, it might be worth adding a progress bar to each loop using e.g. the tqdm module.
from wand.image import Image as w_img
from PIL import Image as p_img
import pyocr.builders
import regex, pyocr, io
infile = 'my_file.pdf'
tool = pyocr.get_available_tools()[0]
tool = tools[0]
req_image = []
txt = ''
# to convert pdf to img and extract text
with w_img(filename = infile, resolution = 200) as scan:
image_png = scan.convert('png')
for i in image_png.sequence:
img_page = w_img(image = i)
req_image.append(img_page.make_blob('png'))
for i in req_image:
content = tool.image_to_string(
p_img.open(io.BytesIO(i)),
lang = tool.get_available_languages()[0],
builder = pyocr.builders.TextBuilder()
)
txt += content
# to save the output as a .txt file
with open(infile[:-4] + '.txt', 'w') as outfile:
full_txt = regex.sub(r'\n+', '\n', txt)
outfile.write(full_txt)
UPDATE MAY 2021
I realized that although pdf2image is simply calling a subprocess, one doesn't have to save images to subsequently OCR them. What you can do is just simply (you can use pytesseract as OCR library as well)
from pdf2image import convert_from_path
for img in convert_from_path("some_pdf.pdf", 300):
txt = tool.image_to_string(img,
lang=lang,
builder=pyocr.builders.TextBuilder())
EDIT: you can also try and use pdftotext library
pdf2image is a simple wrapper around pdftoppm and pdftocairo. It internally does nothing more but calls subprocess. This script should do what you want, but you need a wand library as well as pyocr (I think this is a matter of preference, so feel free to use any library for text extraction you want).
from PIL import Image as Pimage, ImageDraw
from wand.image import Image as Wimage
import sys
import numpy as np
from io import BytesIO
import pyocr
import pyocr.builders
def _convert_pdf2jpg(in_file_path: str, resolution: int=300) -> Pimage:
"""
Convert PDF file to JPG
:param in_file_path: path of pdf file to convert
:param resolution: resolution with which to read the PDF file
:return: PIL Image
"""
with Wimage(filename=in_file_path, resolution=resolution).convert("jpg") as all_pages:
for page in all_pages.sequence:
with Wimage(page) as single_page_image:
# transform wand image to bytes in order to transform it into PIL image
yield Pimage.open(BytesIO(bytearray(single_page_image.make_blob(format="jpeg"))))
tools = pyocr.get_available_tools()
if len(tools) == 0:
print("No OCR tool found")
sys.exit(1)
# The tools are returned in the recommended order of usage
tool = tools[0]
print("Will use tool '%s'" % (tool.get_name()))
# Ex: Will use tool 'libtesseract'
langs = tool.get_available_languages()
print("Available languages: %s" % ", ".join(langs))
lang = langs[0]
print("Will use lang '%s'" % (lang))
# Ex: Will use lang 'fra'
# Note that languages are NOT sorted in any way. Please refer
# to the system locale settings for the default language
# to use.
for img in _convert_pdf2jpg("some_pdf.pdf"):
txt = tool.image_to_string(img,
lang=lang,
builder=pyocr.builders.TextBuilder())
I'm fairly new to Python. Currently I'm making a prototype that takes an image, creates a thumbnail out of it and and uploads it to the ftp server.
So far I got the get image, convert and resize part ready.
The problem I run into is that using the PIL (pillow) Image library converts the image is a different type than that can be used when uploading using storebinary()
I already tried some approaches like using StringIO or BufferIO to save the image in-memory. But I'm getting errors all the time. Sometimes the image does get uploaded but the file appears to be empty (0 bytes).
Here is the code I'm working with:
import os
import io
import StringIO
import rawpy
import imageio
import Image
import ftplib
# connection part is working
ftp = ftplib.FTP('bananas.com')
ftp.login(user="banana", passwd="bananas")
ftp.cwd("/public_html/upload")
def convert_raw():
files = os.listdir("/home/pi/Desktop/photos")
for file in files:
if file.endswith(".NEF") or file.endswith(".CR2"):
raw = rawpy.imread(file)
rgb = raw.postprocess()
im = Image.fromarray(rgb)
size = 1000, 1000
im.thumbnail(size)
ftp.storbinary('STOR Obama.jpg', img)
temp.close()
ftp.quit()
convert_raw()
What I tried:
temp = StringIO.StringIO
im.save(temp, format="png")
img = im.tostring()
temp.seek(0)
imgObj = temp.getvalue()
The error I'm getting lies on the line ftp.storbinary('STOR Obama.jpg', img).
Message:
buf = fp.read(blocksize)
attributeError: 'str' object has no attribute read
For Python 3.x use BytesIO instead of StringIO:
temp = BytesIO()
im.save(temp, format="png")
ftp.storbinary('STOR Obama.jpg', temp.getvalue())
Do not pass a string to storbinary. You should pass a file or file object (memory-mapped file) to it instead. Also, this line should be temp = StringIO.StringIO(). So:
temp = StringIO.StringIO() # this is a file object
im.save(temp, format="png") # save the content to temp
ftp.storbinary('STOR Obama.jpg', temp) # upload temp
I am writing a script which will get an image from a link. Then the image will be resized using the PIL module and the uploaded to Imgur using pyimgur. I dont want to save the image on disk, instead manipulate the image in memory and then upload it from memory to Imgur.
The Script:
from pyimgur import Imgur
import cStringIO
import requests
from PIL import Image
LINK = "http://pngimg.com/upload/cat_PNG106.png"
CLIENT_ID = '29619ae5d125ae6'
im = Imgur(CLIENT_ID)
def _upload_image(img, title):
uploaded_image = im.upload_image(img, title=title)
return uploaded_image.link
def _resize_image(width, height, link):
#Retrieve our source image from a URL
fp = requests.get(link)
#Load the URL data into an image
img = cStringIO.StringIO(fp.content)
im = Image.open(img)
#Resize the image
im2 = im.resize((width, height), Image.NEAREST)
#saving the image into a cStringIO object to avoid writing to disk
out_im2 = cStringIO.StringIO()
im2.save(out_im2, 'png')
return out_im2.getvalue()
When I run this script I get this error: TypeError: file() argument 1 must be encoded string without NULL bytes, not str
Anyone has a solution in mind?
It looks like the same problem as this, and the solution is to use StringIO.
A common tip for searching such issues is to search using the generic part of the error message/string.
Is it possible to generate an in-memory image for testing purposes?
Here is my current code:
def test_issue_add_post(self):
url = reverse('issues_issue_add')
image = 'cover.jpg'
data = {
'title': 'Flying Cars',
'cover': image,
}
response = self.client.post(url, data)
self.assertEqual(response.status_code, 302)
To generate a 200x200 test image of solid red:
import Image
size = (200,200)
color = (255,0,0,0)
img = Image.new("RGBA",size,color)
To convert it to a file-like object, then:
import StringIO
f = StringIO.StringIO(img.tostring())
http://effbot.org/imagingbook/image.htm
Jason's accepted answer is not working for me in Django 1.5
Assuming the generated file is to be saved to a model's ImageField from within a unit test, I needed to take it a step further by creating a ContentFile to get it to work:
from PIL import Image
from StringIO import StringIO
from django.core.files.base import ContentFile
image_file = StringIO()
image = Image.new('RGBA', size=(50,50), color=(256,0,0))
image.save(image_file, 'png')
image_file.seek(0)
django_friendly_file = ContentFile(image_file.read(), 'test.png')
So if client.post is expecting a file like object, you could create an example image (if you want to visually check result after tests) or just make a 1px png and read it out from console
open('1px.png', 'rb').read()
which in my case dumped out
image_data = '\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x00\x01\x00\x00\x00\x01\x08\x02\x00\x00\x00\x90wS\xde\x00\x00\x00\x01sRGB\x00\xae\xce\x1c\xe9\x00\x00\x00\tpHYs\x00\x00\x0b\x13\x00\x00\x0b\x13\x01\x00\x9a\x9c\x18\x00\x00\x00\x07tIME\x07\xdb\x0c\x17\x020;\xd1\xda\xcf\xd2\x00\x00\x00\x0cIDAT\x08\xd7c\xf8\xff\xff?\x00\x05\xfe\x02\xfe\xdc\xccY\xe7\x00\x00\x00\x00IEND\xaeB`\x82'
then you can use StringIO which acts as a file like object, so above, image would be
from StringIO import StringIO
def test_issue_add_post(self):
...
image = StringIO(image_data)
...
and you'll have a file like object with the image data
In Python 3
from io import BytesIO
from PIL import Image
image = Image.new('RGBA', size=(50, 50), color=(155, 0, 0))
file = BytesIO(image.tobytes())
file.name = 'test.png'
file.seek(0)
# + + + django_friendly_file = ContentFile(file.read(), 'test.png') # year 2019, django 2.2.1 -works
Thanks to help from Eduardo, I was able to get a working solution.
from StringIO import StringIO
import Image
file = StringIO()
image = Image.new("RGBA", size=(50,50), color=(256,0,0))
image.save(file, 'png')
file.name = 'test.png'
file.seek(0)
Have you used the PIL module? It lets you manipulate images - and should allow creation as well.
In fact, here's a blog entry with some code that does it
http://bradmontgomery.blogspot.com/2008/07/django-generating-image-with-pil.html
Dont know whether you test machine has an internet connection, but you could also pull down random images from google to vary the test data?