Python: Convert PNG string obtained via toDataURL to binary PNG file - python

The toDataURL method (see e.g. https://developer.mozilla.org/de/docs/Web/API/HTMLCanvasElement/toDataURL) gives a string representation of a PNG of the following form:
"data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAAAUAAAAFCAYAAACNby
blAAAADElEQVQImWNgoBMAAABpAAFEI8ARAAAAAElFTkSuQmCC"
How can I convert such a PNG string to a binary PNG file in python 3 ?

OK, so it was a simple (and maybe stupid) mistake that I made. The first part before the comma, i.e. data:image/png;base64 must be removed, like this
import base64
with open('sample.png', 'wb') as f:
f.write(base64.decodestring(string.split(',')[1].encode()))
does the trick for me. So it is an obvious mistake that you need to remove the header. But I will still leave this as an answer in case it happens to others just as it happened to me. Also look at this thread Python: Ignore 'Incorrect padding' error when base64 decoding concerning padding.

Related

Python - pdfme - writing utf-8 characters to file

I would like to generate report to pdf using pdfme library. I need the Polish characters to be there as well. The example report end with:
with open('document.pdf', 'wb') as f:
build_pdf(document, f)
So I cannot add encoding = "utf-8". Is there any way I can still use Polish characters?
I tried:
Change to write mode and set encoding to utf-8. Getting: "TypeError: write() argument must be str, not bytes".
While having Polish characters add .encode("utf-8"). Example: "Paweł".encode("utf-8"). Getting: "TypeError: value of . attr must be of type str, list or tuple: b'Pawe\xc5\x82'"
In this case, the part of the code responsible for dealing with the unicode characters is the PDF library. The build_pdf call there, for whatever library it is, has to be able to handle any character in "document". And if it fails it is the context for the PDF library, owner of the "build_pdf" call that has to be changed so that it will handle all the characters you need.
"utf-8" is just one form os expressing characters as bytes - aPDF file is a binary file, and it does have internal headers, structures and settings to do its own character encoding handling: your text may endup inside the PDF either encoded as utf-8, or some other, legacy encoding- but that will be transparent for you and anyone using the PDF file.
It may be that the document, if it is text (we don't know if it is plain text, or if it is some object from your library that has already been pre-processed) - but if it is text, and your library says that build_pdf can accept bytes instead, you can encode the document prior to this call:
build_pdf(document.encode('utf-8', f) - but that would be some strange way of working - it is likely that either build_pdf will do the encoding, or whatever process generated the document had already done so.
To get more meaningful help, you have to say which library you are using to geneate the PDF, and include the import lines in your code,including the creation of your document so that we have a minimal reproducible example: i.e. I can copy your code, paste in a .py file here, install the lib, run it, and see a corrupted PDF file with the Polish characters magled: then I, and others, can be able to fix it. Otherwise, this answer is as far as I can get.

Problems decoding German "Umlaute" of QR-Codes with "pyzbar"

I wrote a Python (V3.9.9) program (Windows 10) to decode QR-Codes of type "EPC QR" -> see Wikipedia
Everything is working fine, except, if there are German "Umlaute" (ÄÖÜäöü) within the text of the QR-Code. Here is a sample program, to demonstrate/isolate the problem:
import cv2 # Read image / camera/video input
from pyzbar.pyzbar import decode
img = cv2.imread ("GiroCodeUmlaute.PNG")
print (decode (img))
for code in decode (img):
print (code.type)
print (code.data.decode ("UTF-8"))
And here is the QR-Code for testing:
GiroCodeUmlaute.PNG -> see QR-Code generator
The 6th line of the encoded QR-Code text contains "Ärzte ohne Grenzen".
But when it's decoded with "UTF-8" (which is the correct character set), then "テвzte ohne Grenzen" is displayed.
I think, also the decoded read hex data are looking a bit strange:
[Decoded(data=b'BCD\n002\n1\nSCT\nRLNWATWW\n\xef\xbe\x83\xd0\xb2zte ohne Grenzen...
From where are the 4(!) hex bytes coming? \xef\xbe\x83\xd0\xb2zte
Where is the 'r' of the original text?
The same problem occurs, if this test-program is running under a Raspberry computer.
If this sample QR-Code is scanned by an android mobile app, the "Umlaut" is correct displayed.
From my point of view it looks like a problem of the "pyzbar" module. But maybe I'm doing something wrong?
Every help and tip is appreciated!
Thanks
I faced the same Problem with Swiss Bill QR-Codes on Linux using the python3-qrtools - guess this module is using the zbar-library as well.
A workaround would be to replace the wrong chars in the decoded string. Depending on the output you get out with print (code.data.decode ("UTF-8")) for your Umlaute, I got the following symbols which I need to replace with ä, ö resp. ü:
string = string.replace('瓣', 'ä')
string = string.replace('繹', 'ö')
string = string.replace('羹', 'ü')
It's not a very elegant solution because it does not deal with all special characters which get wrongly decoded.
As I'm calling the python program from a Pascal/Delphi/lazarus Program, I hope to find an other tool somewhere around which reads QR-codes from the webcam and correctly decode it...

Python read .bin data and convert to string

I have multiple bin files, and I want to extract the data from them, but the results i'm getting are pretty weird.
For example, my first file does the following:
path = 'D:\lut.bin'
with open(path, 'rb') as file: # b is important -> binary
fileContent = file.read()
print(fileContent)
Output:
xc7\xfb\x99\x0c\x8e\xf9~7\xb9a\xb1*\x06\xd2o\xb8\xb2 \x81\x8bg\xd2\xc6bE\r\xb9KL7\xa0\xa52\xa5\xd2\x17\x86(\xe9\x02\xbf\xeb\x8fDvk\xe7\x8d\x03\x872\x9fop\xbck\xe1\x94\x02\xdc\xef\x85I\t\xc8\x8d\xdfl\x90\xcf*\xb1\x02(\x16~)\xc7\xa2\x1f\xf6o\xdc\x1en\x84H\xf6%\xfaW/\xee\xbc\xdd^/\x9b\x9a\xe5\x99\xa2\xd7\xe4\x93U\xd4\xef$\xa5\x8aW\xf6\xc9\xb0T\xe3<\x147\xcc\x08}\xc8\x15J3v\n\x9d\x16\xa3\x8d\r\xa2\xc4\x15\xf13!\xa2\x01\x14\xef\xaf\x06\x83p\xa7Ot\x8cr\xdf\xef\xbe\x93\xc2D`y\\\xdb\x8a\x1c\\H\x9cE\xabF\xd6\xe1B\xdd\xbc\x8a\xdb\x06|\x05{!\xf0K25K0\xb9\xfe\xa6n\xd7-\xd1\xcb\xefQ\xd9w\x08{4\x13\xba8\x06\x00}S\xe4\xd8*\xe2\x81f\x8d\xc4P\xde\x88/\xa6q\x7fG\x99\xbd\xa84v\xcfS+\xc6\xc5#\x0ey\xd8\xcd\xf2!\xf8`1\x03k5\xb9\xee\xb3V\xc3">\xdd\xf4\x94\x1b\x83\xf9\xdbe\xfcw\xf4+O\xf4\xf1\xfc\xa2 \xc5\xccq\xd1\xc8dH\x00\xf7K|7\x87\xa8$\xb8\x92^\x90.\xffK\xbf\xf6\xcaHv9l\xa6\x0e\xd5"\xd6`>}f\xfc\xd1\x15\xd0\xf0\x89\xb7\x12\xdf\xc9\xdfn\x97\xc7O\xf8\x05)Ua|\xd6\xd5\x03P\xf3\xcd\x08 \xc6\xc7\xe2"\xae\x1fz\xb9\xbd\x99\x100\x9a\x8d\xeb\x89\xa3T\xa0\xc7S\xcc\xe4h\xbe\xf3R\xe9\x9d\xf4Y\xe91\xa4%\x85>mn\xc3\x1e\x8a}\x04\xd9:\xb5\xde\x01h\x90y\xfe4&\xea\x1d\x9a\xbd\xac\x1a\x8e{\xb2Y\xcb\xc47\xd8\xe2\xf6\xd6\xdc\x91,]\x1d\xca\x90_sb\x86X\xad]\x8e\xe1A\x1a\xaa\xc6\xdf\x1ca#A\x1a\xa2\t!3\x06y\x92\x96\xebg\xdb3\xdd\x9f\xefh\x9d6\x17c0\x0e\xfe\x9a\x06\x06;\x16\xa7\x
I have no idea what this is, but it does not look like readable text, is there a way to even convert this?
My other file looks like this:
U\xff\xf3\xe8d\x00\x00\x00\x01\xa4\x00\x00\x00\x00\x00\x00\x03H\x00\x00\x00\x00LAME3.100UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUUU
Again with the same code above.
I have tried decoding it, I keep getting decoding errors, and text encoding to utf-8 doesn't help either.
I want to get the text from this, these files came with book on the playstore I downloaded.
Bin files are just binary data, meaning each byte can be any value between 0 and 255 (00000000 and 11111111, hex 0x00-0xFF).
Printable characters are a subset of those codes.
This means, not every bin file can be converted to text.
Python tries to visualise the byte stream already by putting those printable characters in place of their \xNN code (where N is a hex digit). The rest of the characters are printed as their codes.
This means the
U\xff\xf3\xe8d\x00\x00\x00\x01\xa4\x00\x00\x00\x00\x00\x00\x03H\x00\x00\x00\x00LAME3.100UU
is in fact
\x55\xff\xf3\xe8\x64\x00\x00\x00\x01\xa4\x00\x00\x00\x00\x00\x00\x03\x48\x00\x00\x00\x00\x4c\x41\x4d\x45\x33\x2e\x31\x30\x30\x55\x55
[Copy this into your Python interpreter as a string (i.e. in quotes) and see how it visually converts itself when displaying/printing!]
The parts:
decoded U,
then not decoded \xff\xf3\xe8
decoded d
not decoded \x00\x00\x00\x01\xa4\x00\x00\x00\x00\x00\x00\x03
decodedH
not decoded \x00\x00\x00\x00
decoded LAME3.100UU
Can you extract some data from it? Depending on the type of the bin, you may probably find some strings directly put in there - like the LAME3.10 that looks like some code/version... but I really doubt that you would find anything useful. It can be literally anything, just dumped there: text, photo, memory dump...
This is very late, but LAME3.100 followed by a bunch of U characters is actually the start of a certain encoding of .mp3 file, and knowing that it may possibly be incomplete, you could try and convert it using https://ffmpeg.org into a proper .mp3 container
After you have ffmpeg in your path, a command such as ffmpeg -i "D:/lut.bin" "D:/lut.mp3" should hopefully decode and re-encode it

Python write as a file, unknown format

So here's a small snippet of a file opened in python. But I'm not sure what kind of formatting this is. Doesn't look like binary. How do I write this back out into a file?
hDuwHkAbG9hZGVyX21jAEAAiQYJHgBpAEAQAHBAAIkGCR4AaQBAEACsQACJBgkeAGkAQBAA5EAAiQYJHgBpAEAQARxAAIkGCR4AaQBAEAFUQACJBgkeAGkAQBABkEAAiQYJHgBpAEAQAchAAIkGCR4AaQBAEAIAQACJBgkeAGkAQBACOEAAiQYJHgBpAEAQAnBAAIkGCR4AaQBAEAKsQACJBgkeAGkAQ
It's a fragment of a base-64 encoded binary file of some sort. Unfortunately, it hasn't been cut at a byte boundary; however, when I insert a letter at the front, the decoded version looks like this:
ÆîÀy�loader_mc�#� �i�#�p#� �i�#�¬#� �i�#�ä#� �i�##� �i�#T#� �i�##� �i�#È#� �i�#�#� �i�#8#� �i�#p#� �i�#¬#� �i�
You can see some clear meaningful ASCII in there. However, what the binary data represents is anyone's guess.
If you just need to decode it into the above binary format, use the base64 module.

Data encoding in python

I download an image from newsgroups with the nntplib module in python. I then want to save the data to the file. I use:
news.group('alt.binaries.misc')
data=''.join(news.body('<DhTgplpHcRsZMBTTw3i35#spot.net>')[-1])
f=open('image.png','wb')
f.write(data)
f.close()
However, the saved file isn't a proper image file.
data is a string of the form:
'\x89PNG=B=C\x1a=C=A=A=A=BIHDR=A=A\x02X=A=A\x01Q\x08\x06=A=A=A\xa8\x81\xd3\x89=A=A=A\tpHYs=A=A\x0b\x13=A=A\x0b\x13\x01=A\x9a\x9c\x18=A=A etc... '
, with a lenght of 309530.
I can tell from the first bytes that the file should be a png file and the size also seems good to me, so I assume that the data is correct.
Does anyone knows what i'm doing wrong?
UPDATE:
I looked in the header of the article and its says: Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 8bit I don't think this is very helpful with decoding the text, but who knows..
I also compared my string with regular headers of png files. This is \x89PNG\r\n\x1a\n, or \x89PNG\x0d\x0a\x1a\x0a. (as alexis also stated)
I concluded that =B stands for \x0d, =C for \x0a and =A for \x00. I assume that the other \x..'s are not encoded, but i'm not sure (I don't know very much about encodings) update3 shows that they do differ.
What is an encoding that encodes this way?
UPDATE2:
the data: -see below- (repr(data))
UPDATE3:
I was able to save the image with another program and then to open it in python. This is what the data should be. -see below-.
The beginnings look kind of similar, but after that there is a big difference. What the hell is this encoding? it really frustrates me. (BTW, thanks for all the great help so far)
All the files: http://dl.dropbox.com/u/1499291/python-encoding-question/index.html
NULL is replaced with "=A", CR with "=B", LF with "=C", and "=" with "=D", very similar to gZip 8bit ([http://www.imc.org/ietf-usefor/2003/Feb/0575.html][1]).
looks like the data is uuencoded, so you have to decode it, before you write it to your image file.
you can verify this, when you uudecode image.png the file you have written with your above script.
Python has support for this in its uu module.
I don't believe NNTP.body is supposed to automatically decode article content, or is it? Have you looked at the article source? It should specify the encoding.
Anyway the PNG signature should start with \x89PNG, but should then be followed by 0d 0a (CR LF). This ain't it. Could it be base64 encoded, or some such? All those = signs look very familiar.

Categories