Error decoding Data Matrix that is base256 encoded with pylibdmtx - python

I'm trying to successfully decode Data Matrix barcodes that are base256 encoded using pylibdmtx. When the barcode contains a 0x00 byte the library seems to treat it as a string terminator (null) and ignores the rest of the data in the barcode.
Here is a snippet of code that will create a barcode and decode it:
from pylibdmtx.pylibdmtx import encode, decode
from PIL import Image
message = b'\x16\x05abc\x64\x00\x65\x66g'
print('message:',message)
barcode = encode(message)
img = Image.frombytes('RGB', (barcode.width, barcode.height), barcode.pixels)
# uncomment if you want to save the barcode to a file
#img.save('barcode.png')
decoded = decode(img)
print('decoded:',decoded)
print(' length:',len(decoded[0].data))
Here is the result:
message: b'\x16\x05abcd\x00efg'
decoded: [Decoded(data=b'\x16\x05abcd', rect=Rect(left=9, top=10, width=80, height=79))]
length: 6
The created barcode reads properly with other online tools and dmtxread invoked from a command line.
Is there a limitation with the python wrappers for libdmtx or something I am doing wrong?
Other Info:
This is the simplest example I could come up with to illustrate the problem. The barcodes are much larger and already in production systems.
I did try the python wrappers for the ZXing library and it did not even recognize many of the barcodes. I am open to using other libraries.
Thank you.

Related

Creating QR code containing both text and photo

It is possible to create a QR code which contains both some text and photo (which is small logo) in python?
I mean text, which is not part of the photo. But I will have separately text (string variable) and photo (e.g. *.png).
So far I saw only the examples where it was possible to create a QR code from text or photo. I couldn't find example with both used at the same time.
Basically when I scan my QR code, I would like for it to show my photo (logo) and text information.
Expanding on my comment. QR Codes are used to encode text. So you have to read in your image, convert to base64, append your string with some delimiter and then write out your QR code.
Because we took special steps to encode the data (the image and the string) before storing in a QR code, we must also have a special application to read the QR code, split the base64 string by our delimiter, and then decode the base64 pieces into their appropriate media (writing the binary to a file for the image and printing out the decoded string).
All in all this will look something like the following:
import base64
#open image and convert to b64 string
with open("small.png", "rb") as img_file:
my_string = base64.b64encode(img_file.read())
#append a message to b64 encoded image
my_string = my_string + b'\0' + base64.b64encode(b'some text')
#write out the qrcode
import qrcode
qr = qrcode.QRCode(
version=2,
error_correction=qrcode.constants.ERROR_CORRECT_M,
)
qr.add_data(my_string, optimize=0)
qr.make()
qr.make_image().save("qrcode.png")
#--------Now when reading the qr code:-------#
#open qr code and read in with cv2 (as an example), decode with pyzbar
from pyzbar.pyzbar import decode
import cv2 #importing opencv
img = cv2.imread('qrcode.png', 0)
barcodes = decode(img)
for barcode in barcodes:
barcodeData = barcode.data.decode("utf-8")
#split the b64 string by the null byte we wrote
data = barcodeData.split('\x00')
#save image to file after decoding b64
filename = 'some_image.jpg'
with open(filename, 'wb') as f:
f.write(base64.b64decode(data[0]))
#print out message after decoding
print(base64.b64decode(data[1]))
Obviously this is only going to work for a VERY small image and just a little bit of text. You quickly blow out the max size of a QR code, and even before you hit the limits for the standard, you'll hit the limits for qrcode and pyzbar modules.
With this, you can start with a tiny image like:
And after encoding and appending text, have a qr code like:
And end up with the exact same picture and your appended text
Ultimately this really isn't terribly useful though since you have to have a special application to decode your QR code. The more traditional method of creating a web page with the content you want to share along with a qr code containing a link to the page, is a more user-friendly and approachable method to solving this.
when I scan my QR code, I would like for it to show my photo (logo) and text information
You can put the picture and the text on a public web page, and then encode the URL of the web page in the QR code.

How do I stop creating a broken png when converting from base64 using Python

I've tried this a number of ways and have searched high and low, but no matter what I try (including all posts I could find here on the subject) I can't manage to convert my base64 string of an HTML document / canvas containing JavaScript.
I'm not getting the incorrect padding error which is quite common (I have ensured 'data:text/html;base64,' is not included at the start of the base64 string.)
I have also checked the base64 string both by checking and running the original .html file, which renders in browser with no issue, and decoding the string with an online decoder.
I know I must be missing something very simple here, but after several hours I'm ready to pull my hair out.
My encoding step is as follows:
htmlSource = bytes(htmlSource,'UTF-8')
fullBase64 = base64.b64encode(htmlSource)
The resultant base64 string is included in my attempts below, which should generate a turquoise oval with shadow on a dirty white background in 4k.
The following attempts all create a png file, only 1kb in size, which cannot be opened - 'It may be damaged or use a file format that Preview doesn’t recognise.':
import base64
img_data = b'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
with open("turquoise egg.png", "wb") as fh:
fh.write(base64.decodebytes(img_data))
Version 2
from binascii import a2b_base64
data = 'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
binary_data = a2b_base64(data)
fd = open('turquoise egg.png', 'wb')
fd.write(binary_data)
fd.close()
Version 3
import base64
fileString = 'PCFET0NUWVBFIGh0bWw+CjxodG1sPgogIDxoZWFkPgogICAgPG1ldGEgbmFtZT0ndmlld3BvcnQnIGNvbnRlbnQ9J3dpZHRoPWRldmljZS13aWR0aCwgaW5pdGlhbC1zY2FsZT0xLjAnPgogIDwvaGVhZD4KPGJvZHk+CjxzdHlsZT4KICAgIGJvZHksIGh0bWwgewogICAgICBwYWRkaW5nOiAwICFpbXBvcnRhbnQ7CiAgICAgIG1hcmdpbjogMCAhaW1wb3J0YW50OwogICAgICBtYXJnaW46IDA7CiAgICB9CiAgICAqIHsKICAgICAgcGFkZGluZzogMDsKICAgICAgbWFyZ2luOiAwOwogICAgfQo8L3N0eWxlPgoKPGNhbnZhcyBpZD0nbXlDYW52YXMnIHN0eWxlPSdvYmplY3QtZml0OiBjb250YWluOyB3aWR0aDogOTl2dzsgaGVpZ2h0OiA5OXZoOyc+CllvdXIgYnJvd3NlciBkb2VzIG5vdCBzdXBwb3J0IHRoZSBIVE1MNSBjYW52YXMgdGFnLjwvY2FudmFzPgoKPHNjcmlwdD4KdmFyIGNhbnZhcyA9IGRvY3VtZW50LmdldEVsZW1lbnRCeUlkKCdteUNhbnZhcycpOwpjYW52YXMud2lkdGggPSA0MDk2OwpjYW52YXMuaGVpZ2h0ID0gNDA5NjsKY2FudmFzLnN0eWxlLndpZHRoID0gJzk5dncnOwpjYW52YXMuc3R5bGUuaGVpZ2h0ID0gJzk5dmgnOwp2YXIgY3R4ID0gY2FudmFzLmdldENvbnRleHQoJzJkJyk7CnZhciBjYW52YXNXID0gY3R4LmNhbnZhcy53aWR0aDsKdmFyIGNhbnZhc0ggPSBjdHguY2FudmFzLmhlaWdodDsKCmN0eC5maWxsU3R5bGUgPSAncmdiYSgyMDAsIDE5NywgMTc3LCAxKSc7CmN0eC5maWxsUmVjdCgwLCAwLCBjYW52YXNXLCBjYW52YXNIKTsKCmN0eC5zaGFkb3dCbHVyID0gY2FudmFzVzsKY3R4LnNoYWRvd0NvbG9yID0gJ3JnYmEoMCwgMCwgMCwgMC4zKSc7CmN0eC5iZWdpblBhdGgoKTsKY3R4LmZpbGxTdHlsZSA9ICdyZ2JhKDUxLCAyMjAsIDE5MSwgMSknOwpjdHguZWxsaXBzZShjYW52YXNXIC8gMiwgY2FudmFzSCAvIDIgLCBjYW52YXNXICogLjQsIGNhbnZhc0ggKiAuNDUsIDAsIDAsIDIgKiBNYXRoLlBJKTsKY3R4LmZpbGwoKTsKCgoKPC9zY3JpcHQ+Cgo8L2JvZHk+CjwvaHRtbD4='
decodeit = open('turquoise egg.png', 'wb')
decodeit.write(base64.b64decode((fileString)))
decodeit.close()
FWIW I originally used the following code to create a png from the HTML without using base64, but it would only ever save the first element of JavaScript generated on the canvas (ie the background) and since I require the information in base64 anyway, thought I would approach it this way in order to capture the complete image
file = open('html.html', 'r')
imgkit.from_file(file, 'png.png')
file.close()
Html2Image has provided the solution I was looking for.
Whilst imgkt wasn't saving the fully rendered canvas, taking screenshot with html2canvas does. Documentation is here and I implemented as follows:
from html2image import Html2Image
hti.screenshot(
html_file = ‘html.html’,
size = (imageW, imageH),
save_as = ‘png.png'
)

Generating and reading QR codes with special characters

I'm writing Python program that does the following:
Create a QR code > Save to a png file > Open the file > Read the QR code information
However, when the data on the code has special characters, I got some confusion output data. Here's my code:
import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode
data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'
file_iso = 'QR_ISO.png'
file_utf = 'QR_Utf.png'
#creating QR codes
qr_iso = pyqrcode.create(data) #creates qr code using iso-8859-1 encoding
qr_utf = pyqrcode.create(data, encoding = 'utf-8') #creates qr code using utf-8 encoding
#saving png files
qr_iso.png(file_iso, scale = 8)
qr_utf.png(file_utf, scale = 8)
#Reading and Identifying QR codes
img_iso = Image.open(file_iso)
img_utf = Image.open(file_utf)
dec_iso = decode(img_iso)
dec_utf = decode(img_utf)
# Reading Results:
print(dec_iso[0].data)
print(dec_iso[0].data.decode('utf-8'))
print(dec_iso[0].data.decode('iso-8859-1'),'\n')
print(dec_utf[0].data)
print(dec_utf[0].data.decode('utf-8'))
print(dec_utf[0].data.decode('iso-8859-1'))
And here's the output:
b'Thoms\xee\x8c\x9e Gon\xe8\xbb\x8blves \xef\xbe\x81maral,325.432.123-21'
Thoms Gon軋lves チmaral,325.432.123-21
Thoms Gon軋lves ï¾maral,325.432.123-21
b'Thoms\xef\xbe\x83\xef\xbd\xb4n Gon\xef\xbe\x83\xef\xbd\xa7alves \xef\xbe\x83\xef\xbc\xbbaral,325.432.123-21'
Thomsテエn Gonテァalves テ[aral,325.432.123-21
Thomsテエn Gonテァalves テ[aral,325.432.123-21
For simple data it works just fine, but when data has characters like 'Á, ç ' and so on this happens.
Any ideas of what should I do to fix it?
Additional information:
I'm using python 3.8 and PyCharm IDE
When I scan the generated codes using an Android App, it reads both codes just fine.
I've read this topic: Unicode Encoding and decoding issues in QRCode but it didn't help much
Try to encode the UTF-8 decoded result with shift-jis and decode the result again with UTF-8.
dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')
This works at least with your example where the QR code uses UTF-8 as well.
See also https://github.com/NaturalHistoryMuseum/pyzbar/issues/14
Alright! Got some updates:
Short version:
The answer from #user14091216 seems to solve the problem. The line:
dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')
does a double-decoding, which fix the problem. I did lots of tests without any error. The new code is down bellow.
What I've tried and found out - Long version:
After talking to some colleagues they suggested that my data was somehow double-encoded. I still don't know why this happens, but for what I've read, it seems to be a problem with pyzbar lib, when it reads data with special characters.
The first thing I've tried was to use the BOM (byte order mark):
Based on my original code, used this lines:
data = '\xEF\xBB\xBF' + 'Thomsôn Gonçalves Ámaral,325.432.123-21'
qr_iso = pyqrcode.create(data) #creates qr code using iso-8859-1 encoding as standard
qr_iso.png(file_iso, scale = 8)
img_iso = Image.open(file_iso)
dec_iso = decode(img_iso)
print(dec_iso[0].data.decode('utf-8'))
And this was the output:
Thomsôn Gonçalves Ámaral,325.432.123-21
Note that even though I created the QR code using 'iso-8859-1' enconding, it only worked when decoded as 'utf-8'.
I also need to treat this data, removing the BOM. Which is easy, but it is an additional step. It worth mentioning that for simpler data (without the special characters), the output didn't have the '' with it.
The solution above works, but at least for me it didn't seem completely right. I was using because I didn't have a better one.
I even try to double decode the data:
Based on 'python double-decoding' searches, I've tried codes like this (and some variations):
dec_iso[0].data.decode('iso-8859-1').encode('raw_unicode_escape').decode('iso-8859-1')
dec_utf[0].data.decode('utf-8).encode('raw_unicode_escape').decode('utf-8)
but none of this worked.
The fix:
As suggested, I tried the following line:
dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8')
And it worked perfectly. I've tested it with over 1800 data strings without getting a single error.
The QR code generation seems to be fine. This line of code only treats the output data from the pyzbar lib, when it reads the QR image (and it doesn't need to be a QR code created by pyqrcode lib specifically).
I haven't been able to decode QR codes generated with 'iso-8859-1' encoding using the same technique. I might be something related to pyzbar or I simply haven't found out which one is the right pattern for the decode-encode-decode process.
So here's a simple code for creating and reading a QR code, based on utf-8 encoding:
import pyqrcode
from PIL import Image
from pyzbar.pyzbar import decode
data = 'Thomsôn Gonçalves Ámaral,325.432.123-21'
file_utf = 'QR_Utf.png'
#creating QR codes
qr_utf = pyqrcode.create(data, encoding = 'utf-8') #creates qr code using utf-8 encoding
#saving png file
qr_utf.png(file_utf, scale = 8)
#Reading and Identifying QR code
img_utf = Image.open(file_utf)
dec_utf = decode(img_utf)
# Decoding Results:
print(dec_utf[0].data.decode('utf-8').encode('shift-jis').decode('utf-8'))
For more info, see also:
iOS: ZBar SDK unicode characters
https://sourceforge.net/p/zbar/support-requests/21/

Save image from byte data with python gdal

I'm retrieving a Sentinel-2 image with metadata from an ArcGIS REST API server. It comes as a byte string. I currently first write the data to a tif-file with f.write(). Then I open it with imageio and later save it as a geoTIFF with gdal.
I would like to save the image directly as a geoTIFF with gdal. But I haven't figured out how to extract the image as a numpy array directly from the byte data, or how to use gdal to write the image directly from the byte data. I use Python 3.8 and Windows 10.
import requests
from imageio import imread
from osgeo import gdal
response = requests.get(im_url, auth=auth) # call image url
im_binary = response.content # the image in binary format
im_newpath = 'testimage.tif'
with open(im_newpath, 'wb') as f:
f.write(im_binary)
print(type(im_binary))
print(im_binary[0:50])
returns:
<class 'bytes'>
b'II*\x00\x08\x00\x00\x00\x15\x00\x00\x01\x03\x00\x01\x00\x00\x00X\x02\x00\x00\x01\x01\x03\x00\x01\x00\x00\x00\x90\x01\x00\x00\x02\x01\x03\x00\r\x00\x00\x00\n\x01\x00\x00\x03\x01\x03\x00'
My attempts to decode the data has so far given bad results.
decoded = im_binary.decode('utf_16')
UnicodeDecodeError: 'utf-16-le' codec can't decode bytes in position 364-365: illegal UTF-16 surrogate
Ignoring or replacing the errors gives various non-nunmeric characters. Any advice?
To summary my comments and insights from answers:
Nothing seems wrong with the approach. Since images consist of pure binary data, there is no use of UTF-decoding routines.
I don't understand the necessity for any other decoding. What is received looks like fine TIFF data.
While writing to a temporary file just to be able to pass the file to another library seems clumsy, it is a robust and clean way to proceed. Probably the physical file can be avoided by using a StringIo instance, but since this requires the full image data to be held in memory, it may not work for huge images. (I'm not sure, how the URL to be passed tom imread() has to look like in that case - seems one more argument to use file instead).

Getting the format of a base64 encoded string using Python 3

I have searched in a lot of different places but could not find the answer to this.
It seems like the suggested way of guessing the extension of base64 encoded string (The string does not have an extension in it and its a valid image) is to use PIL package. This is what I am currently doing.
But when I attempt to open the image I get the error cannot identify image file.
Any suggestions on what I might be doing wrong ?
#img_content is base64 encoded string
decodedbytes = base64.decodebytes(str.encode(image_content))
image_stream = StringIO(str(decodedbytes))
image = Image.open(image_stream) #<-----ERROR
filetype = image.format

Categories