Downloading image with PIL and requests - python

I am trying to download an original image (png format) by url, convert it on the fly (without saving to disc) and save as jpg.
The code is following:
import os
import io
import requests
from PIL import Image
...
r = requests.get(img_url, stream=True)
if r.status_code == 200:
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
It works, but when I try to monitor the download progress (for the future progress bar) with r.iter_content() like this:
r = requests.get(img_url, stream=True)
if r.status_code == 200:
for chunk in r.iter_content():
print(len(chunk))
i = Image.open(io.BytesIO(r.content))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
I get this error:
Traceback (most recent call last):
File "E:/GitHub/geoportal/quicklookScrape/temp.py", line 37, in <module>
i = Image.open(io.BytesIO(r.content))
File "C:\Python35\lib\site-packages\requests\models.py", line 736, in content
'The content for this response was already consumed')
RuntimeError: The content for this response was already consumed
So is it possible to monitor the download progress and after get the data itself?

When using r.iter_content(), you need to buffer the results somewhere. Unfortunately, I can't find any examples where the contents get appended to an object in memory--usually, iter_content is used when a file can't or shouldn't be loaded entirely in memory at once. However, you buffer it using a tempfile.SpooledTemporaryFile as described in this answer: https://stackoverflow.com/a/18550652/4527093. This will prevent saving the image to disk (unless the image is larger than the specified max_size). Then, you can create the Image from the tempfile.
import os
import io
import requests
from PIL import Image
import tempfile
buffer = tempfile.SpooledTemporaryFile(max_size=1e9)
r = requests.get(img_url, stream=True)
if r.status_code == 200:
downloaded = 0
filesize = int(r.headers['content-length'])
for chunk in r.iter_content(chunk_size=1024):
downloaded += len(chunk)
buffer.write(chunk)
print(downloaded/filesize)
buffer.seek(0)
i = Image.open(io.BytesIO(buffer.read()))
i.save(os.path.join(out_dir, 'image.jpg'), quality=85)
buffer.close()
Edited to include chunk_size, which will limit the updates to occurring every 1kb instead of every byte.

Related

Not able to save Image by downloading in python

I need to download a png file from a website and save the same in local directory .
The code is as below :
import pytesseract
from PIL import Image
from pathlib import Path
k = requests.get('https://somewebsite.com/somefile.png',stream =True)
Img=Image.open(k) # <----
Img.save("/new.png")
while executing it in JupyterNotebook
If I execute, i always get an error "response object has no attribute seek"
On the other hand , if I change the code to
Img= Image.open(k.raw), it works fine
I need to understand why it is so
You can save image data from a link using open() and write() functions:
import requests
URL = "https://images.unsplash.com/photo-1574169207511-e21a21c8075a?ixlib=rb-1.2.1&ixid=MnwxMjA3fDB8MHxwaG90by1wYWdlfHx8fGVufDB8fHx8&auto=format&fit=crop&w=880&q=80"
name = "IMG.jpg" #The name of the image once saved
Picture_request = requests.get(URL)
if Picture_request.status_code == 200:
with open(name, 'wb') as f:
f.write(Picture_request.content)
Per pillow the docs:
:param fp: A filename (string), pathlib.Path object or a file object.
The file object must implement file.read,
file.seek, and file.tell methods,
and be opened in binary mode.
response itself is just the response object. Using response.raw implements read, seek, and tell.
However, you should use response.content to get the raw bytes of the image. If you want to open it, then use io.BytesIO (quick explanation here).
import requests
from PIL import Image
from io import BytesIO
URL = "whatever"
name = "image.jpg"
response = requests.get(URL)
mybytes = BytesIO()
mybytes.write(response.content) # write the bytes into `mybytes`
mybytes.seek(0) # set pointer back to the beginning
img = Image.open(mybytes) # now pillow reads from this io and gets all the bytes we want
# do things to img

How to save image which sent via flask send_file

I have this code for server
#app.route('/get', methods=['GET'])
def get():
return send_file("token.jpg", attachment_filename=("token.jpg"), mimetype='image/jpg')
and this code for getting response
r = requests.get(url + '/get')
And i need to save file from response to hard drive. But i cant use r.files. What i need to do in these situation?
Assuming the get request is valid. You can use use Python's built in function open, to open a file in binary mode and write the returned content to disk. Example below.
file_content = requests.get('http://yoururl/get')
save_file = open("sample_image.png", "wb")
save_file.write(file_content.content)
save_file.close()
As you can see, to write the image to disk, we use open, and write the returned content to 'sample_image.png'. Since your server-side code seems to be returning only one file, the example above should work for you.
You can set the stream parameter and extract the filename from the HTTP headers. Then the raw data from the undecoded body can be read and saved chunk by chunk.
import os
import re
import requests
resp = requests.get('http://127.0.0.1:5000/get', stream=True)
name = re.findall('filename=(.+)', resp.headers['Content-Disposition'])[0]
dest = os.path.join(os.path.expanduser('~'), name)
with open(dest, 'wb') as fp:
while True:
chunk = resp.raw.read(1024)
if not chunk: break
fp.write(chunk)

How to get a file from a url and then read it as if it were local?

I have a jpg image that is stored at a url that I need to access and read the binary/byte data from.
I can get the file in Python by using:
import urllib3
http = urllib3.PoolManager()
url = 'link to jpg'
contents = http.request('GET' url)
Purely reading the data from this request with contents.data doesn't provide the correct binary but if I download the file and read it locally, I get the correct binary. But I cannot continue with reading the file contents as such:
with open(contents, "rb") as image:
f = image.read()
Using the bytes from the request doesn't work either:
with open(contents.data, "rb") as image:
f = image.read()
How can I treat the jpg from the url as if it were local so that I can read the binary correctly?
The result obtained in f when file is read locally and the result of contents.data is exactly the same.
import urllib3
http = urllib3.PoolManager()
url = 'https://tinyjpg.com/images/social/website.jpg'
contents = http.request('GET', url)
with open('website.jpg', "rb") as image:
f = image.read()
print(f==contents.data)
You can download the image from the link in the code and then run this code, you will receive output True which implies the data read from local image file is same as data read from website.

Read image from URL and keep it in memory

I am using Python and requests library. I just want to download an image to a numpy array for example and there are multiple questions where you can find different combinations (using opencv, PIL, requests, urllib...)
None of them work for my case. I basically receive this error when I try to download the image:
cannot identify image file <_io.BytesIO object at 0x7f6a9734da98>
A simple example of my code can be:
import requests
from PIL import Image
response = requests.get(url, stream=True)
response.raw.decode_content = True
image = Image.open(response.raw)
image.show()
The main this that is driving me crazy is that, if I download the image to a file (using urllib), the whole process runs without any problem!
import urllib
urllib.request.urlretrieve(garment.url, os.path.join(download_folder, garment.get_path()))
What can I be doing wrong?
EDIT:
My mistake was finally related with URL formation and not with requests
or PIL library. My previous code example should work perfectly if the URL is correct.
I think you are using data from requests.raw object somehow before save them in Image but requests response raw object is not seekable, you can read from it only once:
>>> response.raw.seekable()
False
First open is ok:
>>> response.raw.tell()
0
>>> image = Image.open(response.raw)
Second open throws error (stream position is on the end of file already):
>>> response.raw.tell()
695 # this file length https://docs.python.org/3/_static/py.png
>>> image = Image.open(response.raw)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3/dist-packages/PIL/Image.py", line 2295, in open
% (filename if filename else fp))
OSError: cannot identify image file <_io.BytesIO object at 0x7f11850074c0>
You should save data from requests response in file-like object (or file of course) if you want to use them several times:
import io
image_data = io.BytesIO(response.raw.read())
Now you can read image stream and rewind it as many times as needed:
>>> image_data.seekable()
True
image = Image.open(image_data)
image1 = Image.open(image_data)

BadZipFile while downloading from Kaggle

I am trying to download and unzip Kaggle dataset by python script(Python 3.5), but I get an error.
import io
from zipfile import ZipFile
import csv
import urllib.request
url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'
response = urllib.request.urlopen(url)
c=ZipFile(io.BytesIO(response.read()))
After running this code, I get the following error.
BadZipFile: File is not a zip file
How can I get rid of this error? What's the cause?
Using requests module and some minor fix to http://ramhiser.com/2012/11/23/how-to-download-kaggle-data-with-python-and-requests-dot-py/ the solution is:
import io
from zipfile import ZipFile
import csv
import requests
# The direct link to the Kaggle data set
data_url = 'https://www.kaggle.com/c/quora-question-pairs/download/test.csv.zip'
# The local path where the data set is saved.
local_filename = "test.csv.zip"
# Kaggle Username and Password
kaggle_info = {'UserName': "my_username", 'Password': "my_password"}
# Attempts to download the CSV file. Gets rejected because we are not logged in.
r = requests.get(data_url)
# Login to Kaggle and retrieve the data.
r = requests.post(r.url, data = kaggle_info)
# Writes the data to a local file one chunk at a time.
f = open(local_filename, 'wb')
for chunk in r.iter_content(chunk_size = 512 * 1024): # Reads 512KB at a time into memory
if chunk: # filter out keep-alive new chunks
f.write(chunk)
f.close()
c = ZipFile(local_filename)

Categories