I am using Python and requests library. I just want to download an image to a numpy array for example and there are multiple questions where you can find different combinations (using opencv, PIL, requests, urllib...)
None of them work for my case. I basically receive this error when I try to download the image:
cannot identify image file <_io.BytesIO object at 0x7f6a9734da98>
A simple example of my code can be:
import requests
from PIL import Image
response = requests.get(url, stream=True)
response.raw.decode_content = True
image = Image.open(response.raw)
image.show()
The main this that is driving me crazy is that, if I download the image to a file (using urllib), the whole process runs without any problem!
import urllib
urllib.request.urlretrieve(garment.url, os.path.join(download_folder, garment.get_path()))
What can I be doing wrong?
EDIT:
My mistake was finally related with URL formation and not with requests
or PIL library. My previous code example should work perfectly if the URL is correct.
I think you are using data from requests.raw object somehow before save them in Image but requests response raw object is not seekable, you can read from it only once:
>>> response.raw.seekable()
False
First open is ok:
>>> response.raw.tell()
0
>>> image = Image.open(response.raw)
Second open throws error (stream position is on the end of file already):
>>> response.raw.tell()
695 # this file length https://docs.python.org/3/_static/py.png
>>> image = Image.open(response.raw)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2295, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f11850074c0>
You should save data from requests response in file-like object (or file of course) if you want to use them several times:
import io
image_data = io.BytesIO(response.raw.read())
Now you can read image stream and rewind it as many times as needed:
>>> image_data.seekable()
True
image = Image.open(image_data)
image1 = Image.open(image_data)
Related
I need to download a png file from a website and save the same in local directory .
The code is as below :
import pytesseract
from PIL import Image
from pathlib import Path
k = requests.get('https://somewebsite.com/somefile.png',stream =True)
Img=Image.open(k) # <----
Img.save("/new.png")
while executing it in JupyterNotebook
If I execute, i always get an error "response object has no attribute seek"
On the other hand , if I change the code to
Img= Image.open(k.raw), it works fine
I need to understand why it is so
You can save image data from a link using open() and write() functions:
import requests
URL = "https://images.unsplash.com/photo-1574169207511-e21a21c8075a?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=880&q=80"
name = "IMG.jpg" #The name of the image once saved
Picture_request = requests.get(URL)
if Picture_request.status_code == 200:
with open(name, 'wb') as f:
f.write(Picture_request.content)
Per pillow the docs:
:param fp: A filename (string), pathlib.Path object or a file object.
The file object must implement file.read,
file.seek, and file.tell methods,
and be opened in binary mode.
response itself is just the response object. Using response.raw implements read, seek, and tell.
However, you should use response.content to get the raw bytes of the image. If you want to open it, then use io.BytesIO (quick explanation here).
import requests
from PIL import Image
from io import BytesIO
URL = "whatever"
name = "image.jpg"
response = requests.get(URL)
mybytes = BytesIO()
mybytes.write(response.content) # write the bytes into `mybytes`
mybytes.seek(0) # set pointer back to the beginning
img = Image.open(mybytes) # now pillow reads from this io and gets all the bytes we want
# do things to img
So my program is able to open PNGs but not PDFs, so I made this just to test, and it still isn't able to open even a simple PDF. And I don't know why.
from PIL import Image
with Image.open(r"Adams, K\a.pdf") as file:
print file
Traceback (most recent call last):
File "C:\Users\Hayden\Desktop\Scans\test4.py", line 3, in <module>
with Image.open(r"Adams, K\a.pdf") as file:
File "C:\Python27\lib\site-packages\PIL\Image.py", line 2590, in open
% (filename if filename else fp))
IOError: cannot identify image file 'Adams, K\\a.pdf'
After trying PyPDF2 as suggested (Thanks for the link by the way), I am getting this error with my code.
import PyPDF2
pdf_file= open(r"Adams, K (6).pdf", "rb")
read_pdf= PyPDF2.PdfFileReader(pdf_file)
number_of_pages = read_pdf.getNumPages()
print number_of_pages
Xref table not zero-indexed. ID numbers for objects will be corrected. [pdf.py:1736]
Following this article: https://www.geeksforgeeks.org/convert-pdf-to-image-using-python/ you can use the pdf2image package to convert the pdf to a PIL object.
This should solve your problem:
from pdf2image import convert_from_path
fname = r"Adams, K\a.pdf"
pil_image_lst = convert_from_path(fname) # This returns a list even for a 1 page pdf
pil_image = pil_image_lst[0]
I just tried this out with a one page pdf.
As pointed out by #Kevin (see comment below) PIL has support for writing pdfs but not reading them.
To read a pdf you will need some other library. You can look here which is a tutorial for handling PDFs with PyPDF2.
https://pythonhosted.org/PyPDF2/?utm_source=recordnotfound.com
I am trying to download an original image (png format) by url, convert it on the fly (without saving to disc) and save as jpg.
The code is following:
import os
import io
import requests
from PIL import Image
...
r = requests.get(img_url, stream=True)
if r.status_code == 200:
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
It works, but when I try to monitor the download progress (for the future progress bar) with r.iter_content() like this:
r = requests.get(img_url, stream=True)
if r.status_code == 200:
for chunk in r.iter_content():
print(len(chunk))
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
I get this error:
Traceback (most recent call last):
File "E:/GitHub/geoportal/quicklookScrape/temp.py", line 37, in <module>
i = Image.open(io.BytesIO(r.content))
File "C:\Python35\lib\site-packages\requests\models.py", line 736, in content
'The content for this response was already consumed')
RuntimeError: The content for this response was already consumed
So is it possible to monitor the download progress and after get the data itself?
When using r.iter_content(), you need to buffer the results somewhere. Unfortunately, I can't find any examples where the contents get appended to an object in memory--usually, iter_content is used when a file can't or shouldn't be loaded entirely in memory at once. However, you buffer it using a tempfile.SpooledTemporaryFile as described in this answer: https://stackoverflow.com/a/18550652/4527093. This will prevent saving the image to disk (unless the image is larger than the specified max_size). Then, you can create the Image from the tempfile.
import os
import io
import requests
from PIL import Image
import tempfile
buffer = tempfile.SpooledTemporaryFile(max_size=1e9)
r = requests.get(img_url, stream=True)
if r.status_code == 200:
downloaded = 0
filesize = int(r.headers['content-length'])
for chunk in r.iter_content(chunk_size=1024):
downloaded += len(chunk)
buffer.write(chunk)
print(downloaded/filesize)
buffer.seek(0)
i = Image.open(io.BytesIO(buffer.read()))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
buffer.close()
Edited to include chunk_size, which will limit the updates to occurring every 1kb instead of every byte.
I'm fairly new to Python. Currently I'm making a prototype that takes an image, creates a thumbnail out of it and and uploads it to the ftp server.
So far I got the get image, convert and resize part ready.
The problem I run into is that using the PIL (pillow) Image library converts the image is a different type than that can be used when uploading using storebinary()
I already tried some approaches like using StringIO or BufferIO to save the image in-memory. But I'm getting errors all the time. Sometimes the image does get uploaded but the file appears to be empty (0 bytes).
Here is the code I'm working with:
import os
import io
import StringIO
import rawpy
import imageio
import Image
import ftplib
# connection part is working
ftp = ftplib.FTP('bananas.com')
ftp.login(user="banana", passwd="bananas")
ftp.cwd("/public_html/upload")
def convert_raw():
files = os.listdir("/home/pi/Desktop/photos")
for file in files:
if file.endswith(".NEF") or file.endswith(".CR2"):
raw = rawpy.imread(file)
rgb = raw.postprocess()
im = Image.fromarray(rgb)
size = 1000, 1000
im.thumbnail(size)
ftp.storbinary('STOR Obama.jpg', img)
temp.close()
ftp.quit()
convert_raw()
What I tried:
temp = StringIO.StringIO
im.save(temp, format="png")
img = im.tostring()
temp.seek(0)
imgObj = temp.getvalue()
The error I'm getting lies on the line ftp.storbinary('STOR Obama.jpg', img).
Message:
buf = fp.read(blocksize)
attributeError: 'str' object has no attribute read
For Python 3.x use BytesIO instead of StringIO:
temp = BytesIO()
im.save(temp, format="png")
ftp.storbinary('STOR Obama.jpg', temp.getvalue())
Do not pass a string to storbinary. You should pass a file or file object (memory-mapped file) to it instead. Also, this line should be temp = StringIO.StringIO(). So:
temp = StringIO.StringIO() # this is a file object
im.save(temp, format="png") # save the content to temp
ftp.storbinary('STOR Obama.jpg', temp) # upload temp
I am getting a zipfile from a url.
Extracted, it has the following format:
parent_folder
my_file.csv
image_folder
my_image.jpg
I want to save this, extracted, to my server. Before saving, I want to 1) alter the name of parent_folder and 2) insert extra text into the .csv file
I have tried various combinations of code trying to figure out what is going on:
from StringIO import StringIO
from zipfile import ZipFile
from urllib import urlopen
from PIL import Image
url = urlopen("path.zip")
z = ZipFile(StringIO(url.read()))
# z= z.extractall() # not sure I need this?
for line in z.open("my_file.csv").readlines():
print line
# this does print so I could open a new file and write to it. Is that what I do?
img= Image.open(cStringIO.StringIO(z.open("image_folder/my_image.jpg")))
# error: must be convertible to a buffer, not ZipExtFile
# I don't know how to GET the image
Reason: z.open("image_folder/my_image.jpg") gives a ZipExtFile object, while cStringIO.StringIO wants a buffer.
Solution: Try this:
img= Image.open(cStringIO.StringIO(z.open("image_folder/my_image.jpg").read()))