decoding base64 images from rtf - python

In my rtf document, I want to extract image from string:
The string is like this:
\pard\pard\qc{\*\shppict{\pict\pngblip\picw320\pich192\picwgoal0\pichgoal0
89504e470d0a1a0a0000000d4948445200000140000000c00802000000fa352d9100000e2949444[.....]6c4f0000000049454e44ae426082
}}
questions:
1) is this really base64?
2) How to decode it using below code.?
import base64
imgData = b"base64code00from007aove007string00bcox007idont007know007where007it007starts007and007ends"
with open("imageToSave.png", "wb") as fh:
fh.write(base64.decodestring(imgData))
Full rtf text(which when saved as .rtf shows image) is at:
http://hastebin.com/axabazaroc.tex

No, that's not Base64-encoded data. It is hexadecimal. From the Wikipedia article on the RTF format:
RTF supports inclusion of JPEG, Portable Network Graphics (PNG), Enhanced Metafile (EMF), Windows Metafile (WMF), Apple PICT, Windows Device-dependent bitmap, Windows Device Independent bitmap and OS/2 Metafile picture types in hexadecimal (the default) or binary format in a RTF file.
The binascii.unhexlify() function will decode that back to binary image data for you; you have a PNG image here:
>>> # data contains the hex data from your link, newlines removed
...
>>> from binascii import unhexlify
>>> r = unhexlify(data)
>>> r[:20]
'\x89PNG\r\n\x1a\n\x00\x00\x00\rIHDR\x00\x00\x01#'
>>> from imghdr import test_png
>>> test_png(r, None)
'png'
but of course the \pngblip entry was a clue there. I won't include the image here, it is a rather dull 8-bit 320x192 black rectangle.

Related

What format is ipynb storing images as?

https://pastebin.com/czxkGQp1
Here is a link to the ipynb source code. I'm wondering what format these images are saved as. I'm referring to the long string of characters here:
"data": {
"image/png": "iVBORw0KGgoAAAANSUhEUgAAAXcAAAD4CAYAAAAXUaZHAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAgAElEQVR4nO3deXhU9dn/8ffNvu9hJ+z7IktY3PeKWwGlrTy1arWiffTp8tiyKFZcqmhdautWrKj0sWhLEFFww31Fg0oS9rAHQsKesCRkuX9/zNBfxCCBmWEyM5/XdeXKnO+cmXMfTvhw+ObMfczdERGR+FIt2gWIiEj4KdxFROKQwl1EJA4p3EVE4pDCXUQkDtWIdgEALVq08E6dOkW7DBGRmLJ48eLt7p5U0XNVItw7depEWlpatMsQEYkpZrbhSM9pWkZEJA4p3EVE4pDCXUQkDincRUTikMJdRCQOHTXczayDmb1nZsvNbKmZ/To43szM3jaz1cHvTYPjZmZ/MbMsM0s3s
And on and on and on. I'm trying to find a way to load this ipynb in to a Python script and save these images to my local machine using pillow or some other library.
Any help would be greatly appreciated.
That encoding is known as base64, and can be manipulated using Python's base64 module in the standard library. The 64 comes from all lowercase ASCII letters (26), the uppercase letters (26), the digits 0-9 (10), and the characters + and /. The = characters at the end are used for padding out the encoded bytes so the decoding algorithm works.
You can take your string and decode the base64 in a Jupyter notebook and display it with something like:
%matplotlib inline
import base64
import io
from PIL import Image
s = ""
image = base64.b64decode(s)
img = Image.open(io.BytesIO(image))
img
Of course, you can just save the bytes to disk too if want the the file:
image = base64.b64decode(s)
with open (path, 'wb') as file:
file.write(image)

How to convert to bytes format for PIL images which is similar to normal bytes format

Sorry that I couldnt explain clearly in subject.
I used read() to read the entire image in the form of bytes and also I used PIL's tobytes() to read the same image. But to me the image bytes looks different. Could you please advice on how to have the same bytes generated using read() using PIL's package utility? From raw encoding to utf-8
Code sample:
path3 = r'path'
with io.open(path3, 'rb') as image_file:
content1 = image_file.read()
b'\xff\xd8\x ...
Using PIL:
with io.open(path3, 'rb') as image_file:
content1 = Image.open(image_file).tobytes()
b'\xbf\x91\xc0\xbf\x91\xc0\xbe\x90\xbf\xbe'
In my use case:
from pdf2image import convert_from_bytes
images = convert_from_bytes(open('pp.pdf', 'rb').read())
b=images[0].read() # since this returns list format
AttributeError: 'PpmImageFile' object has no attribute 'read'
Is it possible to have same byte format like read()?
PIL is doing more than just reading the bytes of the image file. It is decompressing it from JPG or PNG or whatever format you are giving it. It's tobytes function returns all the pixel values.
In the first snippet you are simply reading in the bytes of the compressed image file. These will always be different unless you are using an uncompressed file format like BMP.

Store a base64 image in python memory, then retrieve for use in wxpython/PIL

1) I have an image that I converted to a string. It looks like this:
bytesimage = b'iVBORw0KGgoAAAANSUhEUgA.... etc etc
2) I can convert it to an 'bytesimage.png' using:
def StringToImage(self, stringname, imageoutput):
imgdata = base64.b64decode(stringname)
imagename = imageoutput
with open(imagename, 'wb') as f:
f.write(imgdata)
3) But then I want to save that image or string to memory to use in wxpython interface without needing to save the file. I have seen several related questions where the solution is using io.BytesIO, but I just cant connect the steps and both wxpython or PIL don't seem to read the bytes properly.
So to clarify:
I have a image stored in a string DONE
I can convert that to an image (if needed) but dont want to save it DONE
I need that string OR image (whichever is best) saved to memory NEEDS SOLVING
Then I want to be able to use that image in wxpython (I can open in PIL first if required)
Any help would be fantastic!
StringIO seems to be the way to go. It allows you to pass the decoded string directly to PIL.
import base64
from PIL import Image
import StringIO
# Banana emoji (JPG) as a b64 string.
b64_img_str = '/9j/4AAQSkZJRgABAQEAYABgAAD/4QCKRXhpZgAATU0AKgAAAAgABVEAAAQAAAAIAAAASlEBAAMAAAABA+YAAFECAAEAAAAYAAAAalEDAAEAAAABAAAAAFEEAAEAAAABBgAAAAAAAAAAAAAKAAAACgAAAAoAAAAKAAAACgAAAAoAAAAKAAAACgAAAP//AP///8zMAP8AADBkAP8A/5iYAP/bAEMAAgEBAgEBAgICAgICAgIDBQMDAwMDBgQEAwUHBgcHBwYHBwgJCwkICAoIBwcKDQoKCwwMDAwHCQ4PDQwOCwwMDP/bAEMBAgICAwMDBgMDBgwIBwgMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDAwMDP/AABEIACMAIQMBIgACEQEDEQH/xAAfAAABBQEBAQEBAQAAAAAAAAAAAQIDBAUGBwgJCgv/xAC1EAACAQMDAgQDBQUEBAAAAX0BAgMABBEFEiExQQYTUWEHInEUMoGRoQgjQrHBFVLR8CQzYnKCCQoWFxgZGiUmJygpKjQ1Njc4OTpDREVGR0hJSlNUVVZXWFlaY2RlZmdoaWpzdHV2d3h5eoOEhYaHiImKkpOUlZaXmJmaoqOkpaanqKmqsrO0tba3uLm6wsPExcbHyMnK0tPU1dbX2Nna4eLj5OXm5+jp6vHy8/T19vf4+fr/xAAfAQADAQEBAQEBAQEBAAAAAAAAAQIDBAUGBwgJCgv/xAC1EQACAQIEBAMEBwUEBAABAncAAQIDEQQFITEGEkFRB2FxEyIygQgUQpGhscEJIzNS8BVictEKFiQ04SXxFxgZGiYnKCkqNTY3ODk6Q0RFRkdISUpTVFVWV1hZWmNkZWZnaGlqc3R1dnd4eXqCg4SFhoeIiYqSk5SVlpeYmZqio6Slpqeoqaqys7S1tre4ubrCw8TFxsfIycrS09TV1tfY2dri4+Tl5ufo6ery8/T19vf4+fr/2gAMAwEAAhEDEQA/AP38oorw/wD4KBf8FAvAP/BNn4BP8QviE+oSabJdmwsrKwNst1qdwLae7aGJrmaG3Egt7W5kVJJkMpiEUQknlhhkAND9k39vz4N/t1f8Jf8A8Kj+IXh/x5/wgeqto2t/2bIx+yTjdtddyr5tvJtfyrmLfBN5cnlyPsbHsFfh38EPGGqf8E9vgJ8Jvix4S0rX/Cfiz4Q/DvwsnxX8Kw6jpuuaZ8TfBcqzv/a1pJa6hJYySRG21u5sblbmO4/cvFJEbe5iVv3Er5PhLiylnlKtelKjWoVJU6lOekotWcX5xnBxnCS0al3Tt0Yig6TWt01dNf10egUUUV9Yc4V8n/8ABQDUfgL+3D+xf8ffB2q/EzwfJH8OfD+rnxJq2h6h/a2p/DW4FjfQSXU1vYyi6jkSJb2OS2yhuYRdWzh45ZY2+sK/A/8Aag8G6PYftY6D8LPD7ah+0b4N+DOn+D/hN4hi8B/Cj/hI5vC/hew1h9Vl0/Xmt5pJb6/a40HRoHkGy0jjl1gR6etwWiPBmmPWDws8Q4ubim1GKblJpN8sUk227WStq9AVuZRbtd2PGPHvh39pDxd+x58Qvih8Y7nVvhP4J/4QfU47jSfEeoX76l4svX0vVLG0snfWb281hYbe6v5SkNxcRWpmlge2s5JLqS5H9E/w4/aF8A/GPxV4n0Lwj448H+Ktb8E3f2DxFp+j6zbX11oFxvlTybuKJ2e3k3wzLskCnMUgxlTj8YP2qv8AgoZrGs/tafCjXvid8OfGHw7+E/hHxZ/amk6b47N34JufHmsWqXZ0wwyXLQwQaba38enXdxLfOshL2rfZTHDIK/Qn/gmH/wAEyvE37Efgv4b2XjLxx4Z8US/CvwhqXhLw3a+HvCqaHHaW+qXtlfX32yRZWW/mEun2ix3KW9o7gXEs6TTTl4/yfwZo5zLA18fxBSVHE1pJumk24RirQVSbblOq7tycm5JWVopKK9jPMyw+KxPLhoxjGKStFPlXpdtu+7u279dj7Gooor9nPHCv5YdU/bu+Jmvf8Fufhd+zjqGqeH9R+EXwT/aA03wT4E0u78K6TcX/AIW0fT/E1pbWtta6m9sdQTENjaxySfaPMnWICV5MtkooA6D/AILcf8FQPjV/wTu/4Lr/ALQUnwd8ReH/AAfqGqf2It1qn/CHaLqGqyxSeH9GL2/226tJbkW5a3hfyBIIg6bwm4lj/S98J/hboPwO+FnhnwT4Wsf7L8M+D9KtdE0iz86Sf7JZ20KQwReZIzSPtjRV3OzMcZJJyaKKAOgooooA/9k='
# Decode back to the original bytes
new_img_str = base64.b64decode(b64_img_str)
# Use StringIO to provide an in-memory buffer that we can use
# to pass the image string to PIL.
sio = StringIO.StringIO(new_img_str)
img = Image.open(sio)
# Display the image
img.show()

Getting the format of a base64 encoded string using Python 3

I have searched in a lot of different places but could not find the answer to this.
It seems like the suggested way of guessing the extension of base64 encoded string (The string does not have an extension in it and its a valid image) is to use PIL package. This is what I am currently doing.
But when I attempt to open the image I get the error cannot identify image file.
Any suggestions on what I might be doing wrong ?
#img_content is base64 encoded string
decodedbytes = base64.decodebytes(str.encode(image_content))
image_stream = StringIO(str(decodedbytes))
image = Image.open(image_stream) #<-----ERROR
filetype = image.format

How to convert image data type from windows sql server to image using python?

I need to convert compressed image column data from windows sql server to image file and save it to file system.
data is in github gist
I am using Python 2.7.2, Pillow on mac.
Thank you !
What I did was opening your gist in my browser, then save as... to a file named 'chenchi.txt'.
I then used this program to convert the hex-encoded string to raw bytes and load them into Pillow to make an image out of it:
from PIL import Image
import StringIO
import binascii
# In your case, 's' will be the string from the field
# in the database.
s = open("chenchi.txt").read()
# chop off the '0x' at the front.
s = s[2:]
# Decode it to binary.
binary = binascii.unhexlify(s)
# Wrap the bytes in a memory stream that can be read like a file.
bytes = StringIO.StringIO(binary)
# Use pillow to read the memory stream into an image (it autodetects the format).
im = Image.open(bytes)
# And show it. Or you could .save() it.
im.show()
Worked to me using b16decode.
My exported image from sql is something like that: 'FFD8FFE000104A46494600010101004800480000FFE13...'
So I had to convert the content and saved into a file.
source = 'data.dat'
destination = 'data.jpg'
with open(source, 'r') as f:
content = f.read()
content = base64.b16decode(content)
with open(destination, 'wb') as g:
g.write(content)

Categories