Save image from byte data with python gdal - python

I'm retrieving a Sentinel-2 image with metadata from an ArcGIS REST API server. It comes as a byte string. I currently first write the data to a tif-file with f.write(). Then I open it with imageio and later save it as a geoTIFF with gdal.
I would like to save the image directly as a geoTIFF with gdal. But I haven't figured out how to extract the image as a numpy array directly from the byte data, or how to use gdal to write the image directly from the byte data. I use Python 3.8 and Windows 10.
import requests
from imageio import imread
from osgeo import gdal
response = requests.get(im_url, auth=auth) # call image url
im_binary = response.content # the image in binary format
im_newpath = 'testimage.tif'
with open(im_newpath, 'wb') as f:
f.write(im_binary)
print(type(im_binary))
print(im_binary[0:50])
returns:
<class 'bytes'>
b'II*\x00\x08\x00\x00\x00\x15\x00\x00\x01\x03\x00\x01\x00\x00\x00X\x02\x00\x00\x01\x01\x03\x00\x01\x00\x00\x00\x90\x01\x00\x00\x02\x01\x03\x00\r\x00\x00\x00\n\x01\x00\x00\x03\x01\x03\x00'
My attempts to decode the data has so far given bad results.
decoded = im_binary.decode('utf_16')
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 364-365: illegal UTF-16 surrogate
Ignoring or replacing the errors gives various non-nunmeric characters. Any advice?

To summary my comments and insights from answers:
Nothing seems wrong with the approach. Since images consist of pure binary data, there is no use of UTF-decoding routines.
I don't understand the necessity for any other decoding. What is received looks like fine TIFF data.
While writing to a temporary file just to be able to pass the file to another library seems clumsy, it is a robust and clean way to proceed. Probably the physical file can be avoided by using a StringIo instance, but since this requires the full image data to be held in memory, it may not work for huge images. (I'm not sure, how the URL to be passed tom imread() has to look like in that case - seems one more argument to use file instead).

Related

Save JPEG comment using Pillow

I need to save an Image in Python (created as a Numpy array) as a JPEG file, while including a "comment" in the file with some specific metadata. This metadata will be used by another (third-party) application and is a simple ASCII string. I have a sample image including such a "comment", which I can read out using Pillow (PIL), via the image.info['comment'] or the image.app['COM'] property. However, when I try a simple round-trip, i.e. loading my sample image and save it again using a different file name, the comment is no longer preserved. Equally, I found no way to include a comment in a newly created image.
I am aware that EXIF tags are the preferred way to save metadata in JPEG images, but as mentioned, the third-party application only accepts this data as a "comment", not as EXIF, which I cannot change. After reading this question, I looked into the binary structure of my sample file and found the comment at the start of the file, after a few bytes of some other (meta)data. I do however not know a lot about binary file manipulation, and also I was wondering if there is a more elegant way, other than messing with the binary...
EDIT: minimum example:
from PIL import Image
img = Image.open(path) # where path is the path to the sample image
# this prints the desired metadata if it is correctly saved in loaded image
print(img.info["comment"])
img.save(new_path) # save with different file name
img.close()
# now open to see if it has been saved correctly
new_img = Image.open(new_path)
print(new_img.info['comment']) # now results in KeyError
I also tried img.save(new_path, info=img.info), but this does not seem to have an effect. Since img.info['comment'] appears identical to img.app['COM'], I tried img.save(new_path, app=img.app), again does not work.
Just been having a play with this and I couldn't see anything directly in Pillow to support this. I've found that the save() method supports a parameter called extra that can be used to pass arbitrary bytes to the output file.
We then just need a simple method to turn a comment into a valid JPEG segment, for example:
import struct
from PIL import Image
def make_jpeg_variable_segment(marker: int, payload: bytes) -> bytes:
"make a JPEG segment from the given payload"
return struct.pack('>HH', marker, 2 + len(payload)) + payload
def make_jpeg_comment_segment(comment: bytes) -> bytes:
"make a JPEG comment/COM segment"
return make_jpeg_variable_segment(0xFFFE, comment)
# open source image
with Image.open("foo.jpeg") as im:
# save out with new JPEG comment
im.save('bar.jpeg', extra=make_jpeg_comment_segment("hello world".encode()))
# read file back in to ensure comment round-trips
with Image.open('bar.jpeg') as im:
print(im.app['COM'])
print(im.info['comment'])
Note that in my initial attempts I tried appending the comment segment at the end of the file, but Pillow wouldn't load this comment even after calling the .load() method to force it to load the entire JPEG file.
Update: The upcoming version Pillow version 9.4.0 will support this by passing a comment parameter while saving, e.g.:
with Image.open("foo.jpeg") as im:
im.save('bar.jpeg', comment="hello world")
hopefully that makes things easier!

How do I stop creating a broken png when converting from base64 using Python

I've tried this a number of ways and have searched high and low, but no matter what I try (including all posts I could find here on the subject) I can't manage to convert my base64 string of an HTML document / canvas containing JavaScript.
I'm not getting the incorrect padding error which is quite common (I have ensured 'data:text/html;base64,' is not included at the start of the base64 string.)
I have also checked the base64 string both by checking and running the original .html file, which renders in browser with no issue, and decoding the string with an online decoder.
I know I must be missing something very simple here, but after several hours I'm ready to pull my hair out.
My encoding step is as follows:
htmlSource = bytes(htmlSource,'UTF-8')
fullBase64 = base64.b64encode(htmlSource)
The resultant base64 string is included in my attempts below, which should generate a turquoise oval with shadow on a dirty white background in 4k.
The following attempts all create a png file, only 1kb in size, which cannot be opened - 'It may be damaged or use a file format that Preview doesn’t recognise.':
import base64
img_data = b'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
with open("turquoise egg.png", "wb") as fh:
fh.write(base64.decodebytes(img_data))
Version 2
from binascii import a2b_base64
data = 'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
binary_data = a2b_base64(data)
fd = open('turquoise egg.png', 'wb')
fd.write(binary_data)
fd.close()
Version 3
import base64
fileString = 'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
decodeit = open('turquoise egg.png', 'wb')
decodeit.write(base64.b64decode((fileString)))
decodeit.close()
FWIW I originally used the following code to create a png from the HTML without using base64, but it would only ever save the first element of JavaScript generated on the canvas (ie the background) and since I require the information in base64 anyway, thought I would approach it this way in order to capture the complete image
file = open('html.html', 'r')
imgkit.from_file(file, 'png.png')
file.close()
Html2Image has provided the solution I was looking for.
Whilst imgkt wasn't saving the fully rendered canvas, taking screenshot with html2canvas does. Documentation is here and I implemented as follows:
from html2image import Html2Image
hti.screenshot(
html_file = ‘html.html’,
size = (imageW, imageH),
save_as = ‘png.png'
)

Adding custom extratags with tifffile

I'm trying to write a script to simplify my everyday life in the lab. I operate one ThermoFisher / FEI scanning electron microscope and I save all my pictures in the TIFF format.
The microscope software is adding an extensive custom TiffTag (code 34682) containing all the microscope / image parameters.
In my script, I would like to open an image, perform some manipulations and then save the data in a new file, including the original FEI metadata. To do so, I would like to use a python script using the tifffile module.
I can open the image file and perform the needed manipulations without problems. Retrieving the FEI metadata from the input file is also working fine.
I was thinking to use the imwrite function to save the output file and using the extratags optional argument to transfer to the output file the original FEI metadata.
This is an extract of the tifffile documentation about the extratags:
extratags : sequence of tuples
Additional tags as [(code, dtype, count, value, writeonce)].
code : int
The TIFF tag Id.
dtype : int or str
Data type of items in 'value'. One of TIFF.DATATYPES.
count : int
Number of data values. Not used for string or bytes values.
value : sequence
'Count' values compatible with 'dtype'.
Bytes must contain count values of dtype packed as binary data.
writeonce : bool
If True, the tag is written to the first page of a series only.
Here is a snippet of my code.
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
input_tags['FEI_HELIOS'].value, True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
This code is not working and complaining that the value of the extra tag should be ASCII 7-bit encoded. This looks already very strange to me because I haven't touched the metadata and I am just copying it to the output file.
If I convert the metadata tag value in a string as below:
my_extratags = [(input_tags['FEI_HELIOS'].code,
input_tags['FEI_HELIOS'].dtype,
input_tags['FEI_HELIOS'].count,
str(input_tags['FEI_HELIOS'].value), True)]
tifffile.imwrite('output.tif', data, extratags = my_extratags)
the code is working, the image is saved, the metadata corresponding to 'FEI_HELIOS' is created but it is empty!
Can you help me in finding what I am doing wrongly?
I don't need to use tifffile, but I would prefer to use python rather than ImageJ because I have already several other python scripts and I would like to integrate this new one with the others.
Thanks a lot in advance!
toto
ps. I'm a frequent user of stackoverflow, but this is actually my first question!
In principle the approach is correct. However, tifffile parses the raw values of certain tags, including FEI_HELIOS, to dictionaries or other Python types. To get the raw tag value for rewriting, it needs to be read from file again. In these cases, use the internal TiffTag._astuple function to get an extratag compatible tuple of the tag, e.g.:
import tifffile
with tifffile.TiffFile('FEI_SEM.tif') as tif:
assert tif.is_fei
page = tif.pages[0]
image = page.asarray()
... # process image
with tifffile.TiffWriter('copy1.tif') as out:
out.write(
image,
photometric=page.photometric,
compression=page.compression,
planarconfig=page.planarconfig,
rowsperstrip=page.rowsperstrip,
resolution=(
page.tags['XResolution'].value,
page.tags['YResolution'].value,
page.tags['ResolutionUnit'].value,
),
extratags=[page.tags['FEI_HELIOS']._astuple()],
)
This approach does not preserve Exif metadata, which tifffile cannot write.
Another approach, since FEI files seem to be written uncompressed, is to directly memory map the image data in the file to a numpy array and manipulate that array:
import shutil
import tifffile
shutil.copyfile('FEI_SEM.tif', 'copy2.tif')
image = tifffile.memmap('copy2.tif')
... # process image
image.flush()
Finally, consider tifftools for rewriting TIFF files where tifffile is currently failing, e.g. Exif metadata.

Error decoding Data Matrix that is base256 encoded with pylibdmtx

I'm trying to successfully decode Data Matrix barcodes that are base256 encoded using pylibdmtx. When the barcode contains a 0x00 byte the library seems to treat it as a string terminator (null) and ignores the rest of the data in the barcode.
Here is a snippet of code that will create a barcode and decode it:
from pylibdmtx.pylibdmtx import encode, decode
from PIL import Image
message = b'\x16\x05abc\x64\x00\x65\x66g'
print('message:',message)
barcode = encode(message)
img = Image.frombytes('RGB', (barcode.width, barcode.height), barcode.pixels)
# uncomment if you want to save the barcode to a file
#img.save('barcode.png')
decoded = decode(img)
print('decoded:',decoded)
print(' length:',len(decoded[0].data))
Here is the result:
message: b'\x16\x05abcd\x00efg'
decoded: [Decoded(data=b'\x16\x05abcd', rect=Rect(left=9, top=10, width=80, height=79))]
length: 6
The created barcode reads properly with other online tools and dmtxread invoked from a command line.
Is there a limitation with the python wrappers for libdmtx or something I am doing wrong?
Other Info:
This is the simplest example I could come up with to illustrate the problem. The barcodes are much larger and already in production systems.
I did try the python wrappers for the ZXing library and it did not even recognize many of the barcodes. I am open to using other libraries.
Thank you.

How to convert numpy array to bytes object without save audio file on disk?

I am now learning to build a TTS project based on Tacotron-2.
Here, the original code in save_wav(wav, path, sr) function has a step to save a numpy array to .wav file by using
wav *= 32767 / max(0.01, np.max(np.abs(wav)))
scipy.io.wavfile.write(path, hparams.sample_rate, wav.astype(np.int16))
However, after obtained a numpy array using wav *= 32767 / max(0.01, np.max(np.abs(wav))), I want to convert it to a .mp3 file so that it will be easier to send it back as streaming response.
Right now, I can convert .wav bytes object to a .mp3 file, but the problem is that I don't know how to convert the numpy array to a .wav bytes object.
I searched about it and found that it seems like I need to set a header for the numpy array, but in almost all posts that I looked into indicated using modules like scipy.io.wave and audioop, which will first save the numpy array to a .wav file and then with open('filename.wav', 'rb').
(This is the link for scipy.io.wavfile.write module, where the filename param should be string or open file handle which, from my understanding, the generated .wav file will be saved on disk.)
Could anyone give any suggestion on how to achieve this?
Use io.BytesIO
There is a much simpler and more convenient solution using a little hack creating i/o interface of bytes. We can use it like file for write and read:
import io
from scipy.io.wavfile import write
bytes_wav = bytes()
byte_io = io.BytesIO(bytes_wav)
write(byte_io, <audio_sr>, <audio_numpy_array>)
result_bytes = byte_io.read()
Use your data sample rate and values array instead of <audio_sr> and <audio_numpy_array>.
You can operate with result_bytes as bytes of .wav file (as required).
P.S. Also check this simple gist of how to perform values array -> bytes -> values array for wav file.
I finally solved this problem by modifying and creating new modules based on scipy.io.wavfile.write and audio_segment.py of pydub.
Beside, when you want to do operation on wave/mp3 bytes without saving them as a .wav/.mp3 file (normally by using some handful APIs or python package module), you should manually add header for it. It will not be a too-tough task if you look into those excellent package source codes.

Categories