Get list of result files converted via Wand - python

Imagemagick is open source software suite for displaying, converting, and editing raster image files. Wand is a ctypes-based ImageMagick binding for Python.
How do i get the list of image files, which i got as a result of using Wand?
For example there is a 2-page PDF file file.pdf and i converted it to 2 JPEG files file-0.jpg and file-1.jpg. How do i get the list ['file-0.jpg', 'file-1.jpg']?
Currently i simply use glob:
with Image(filename='file.pdf') as original:
with original.clone() as converted:
converted.format = 'jpeg'
converted.save(filename='file.jpg')
images = glob('*.jpg')
But maybe there is a more idiomatic way through Wand library itself.

You can use Image.sequence. Each sequence item has index.
from wand.image import Image
with Image(filename='file.pdf') as img:
img.save(filename='file.jpg')
if len(img.sequence) > 1:
images = ['file-{0.index}.jpg'.format(x) for x in img.sequence]
else:
images = ['file.jpg']

Related

How to convert multi-page Image bytes into PDF or TIFF bytes in memory with Python?

I am looking for a solution to convert Image bytes into PDF bytes in memory only.
For my web application, it takes in pdf/tiff documents (can be multi-paged) for information extraction.
I am adding in an image preprocessing step at the start of the pipeline. However, this step is only applicable for images as I am using OpenCV2. Thus, the pdf/tiff file is converted into image(s) for preprocessing. However, to send the file for information extraction I will need to join them back together, as there is a different logic flow for the first vs the subsequent pages.
I was previously using a workaround (referencing local path of merged pdf) but now I would like to remove the dependency and do everything in-memory. This is so that I will be able to deploy the application on the cloud.
image = Image.open(io.BytesIO(file_str))
num_frames = image.n_frames
# Loop through each page of a tif file
for i in range(num_frames):
image.seek(i)
file_array = np.array(image)
file_array = file_array.astype(np.uint8) * 255
# Preprocessing (removed for simplicity)
# TODO: Merge back into PDF file
Edit:
Simple answer: I can't do this in memory. Instead, I have used the tempfile library to help me to save the files there and delete the temporary directory after I am done. That, in some way has helped to achieve the "in memory" aspect.
Writing (not reading) multi-page PDF files is possible using Pillow. For the below solution, I used pdf2image for converting some multi-page PDF file to a list of Pillow Image objects. So, please adapt that according to your existing code.
from PIL import Image
import pdf2image
import numpy as np
# Read pages from PDF to Pillow Image objects
frames_in = pdf2image.convert_from_path('path/to/your/file.pdf')
# Enumerate frames, and preprocess
frames_out = []
for i, frame in enumerate(frames_in):
# Convert to NumPy array
frame = np.array(frame)
# Preprocessing for the first page
if i == 0:
frame[:100, ...] = [255, 0, 0]
# Preprocessing for the other pages
else:
frame[:100, ...] = [0, 0, 255]
# Convert back to Pillow Image object, and append to output list
frames_out.append(Image.fromarray(frame))
frames_out[0].save('output.pdf', save_all=True, append_images=frames_out[1:])
When using some sample PDF, the output looks the same, but with a red rectangle on the first page, and a blue rectangle on the second page.
----------------------------------------
System information
----------------------------------------
Platform: Windows-10-10.0.16299-SP0
Python: 3.9.1
PyCharm: 2021.1.1
NumPy: 1.20.2
pdf2image 1.14.0
Pillow: 8.2.0
----------------------------------------

How to save (write) a list of images from a dataset into a new folder - openCV Python?

I'm so much newbie in openCV/Python tasks. I use Python 3.7 and openCV 4 running by a JNotebook. The question: I wanna save just 1,000 images from a dataset with 10,000 pictures, extracting them from it and write only those 1,000.jpeg in a new folder, is it possible using openCV package in Python? I've already had a list of names (1,000 images).
If you need just to copy files, you even don't need OpenCV tools:
out_folder_path = '...'
in_folder_path = '...'
images_to_save_names = [...]
for image_name in images_to_save_names:
cur_image_path = os.path.join(in_folder_path, image_name)
cur_image_out_path = os.path.join(out_folder_path, image_name)
shutil.copyfile(cur_image_path, cur_image_out_path)
If you have image names and their binary data from some specific DS file(.csv, .hdf, e.t.c.), you can use cv2.imwrite(path, image) instead of copying.
Assuming you have OpenCV correctly installed on your machine, you can first read the images with img = cv.imread(filename) and then write them with cv.imwrite(filename, img).

Merge multiple base64 images into one

If I have multiple base64 strings that are images (one string = one image). Is there a way to combine them and decode to a single image file? i.e. from multiple base64 strings, merge and output a single image file.
I'm not sure how I would approach this using Pillow (or if I even need it).
Further clarification:
The source images are TIFFs that are encoded into base64
When I say "merge", I mean turning multiple images into a multi-page image like you see in a multi-page PDF
I dug through the Pillow documentation (v5.3) and found something that seems to work. Basically, there are two phases to this:
Save encoded base64 strings as TIF
Append them together and save to disk
Example using Python 3.7:
from PIL import Image
import io
import base64
base64_images = ["asdfasdg...", "asdfsdafas..."]
image_files = []
for base64_string in base64_images:
buffer = io.BytesIO(base64.b64decode(base64_string))
image_file = Image.open(buffer)
image_files.append(image_file)
combined_image = images_files[0].save(
'output.tiff',
save_all=True,
append_images=image_files[1:]
)
In the above code, I first create PIL Image objects from a bytes buffers in order to do this whole thing in-memory but you can probably use .save() and create a bunch of tempfiles instead if I/O isn't a concern.
Once I have all the PIL Image objects, I choose the first image (assuming they were in desired order in base64_images list) and append the rest of the images with append_images flag. The resulting image has all the frames in one output file.
I assume this pattern is extensible to any image format that supports the save_all and append_images keyword arguments. The Pillow documentation should let you know if it is supported.

building tiff stack with wand

How can I achieve this with Wand library for python:
convert *.png stack_of_multiple_pngs.tiff
?
In particular, how can I read every png image, pack them into a sequence and then save the image as tiff stack:
with Image(filename='*.tiff') as img:
img.save(filename='stack_of_multiple_pngs.tiff')
I understand how to do it for gifs though, i.e. as described in docs. But what about building a sequence as a list and appending every new image I read as a SingleImage()?
Having trouble figuring it out right now.
See also
With wand you would use Image.sequence, not a wildcard filename *.
from wand.image import Image
from glob import glob
# Get list of all images filenames to include
image_names = glob('*.tiff')
# Create new Image, and extend sequence
with Image() as img:
img.sequence.extend( [ Image(filename=f) for f in image_names ] )
img.save(filename='stack_of_multiple_pngs.tiff')
The sequence_test.py file under the test directory will have better examples of working with the image sequence.

Wand Python multi-size icon

i'm trying to use Wand to create a multi-size ico, but i don't find anything talking about that, only normal conversion, to ico... i've found "Sequences":
https://wand.readthedocs.org/en/latest/roadmap.html
and sequences look like what i need, but i only see samples trying to read the multiple images, but not how to create, am i missing something? or is not possible?
or is it possible to do using PIL/PILLOW?
You can append() a single image to Image.sequence list. For example:
from wand.color import Color
from wand.image import Image
with Image(width=32, height=32, background=Color('red')) as ico:
with Image(width=16, height=16, background=Color('green')) as s16:
ico.sequence.append(s16)
ico.save(filename='multisized.ico')
Result (multisized.ico):
I had a similar problem, but with creating a multi-page PDF from multiple JPEG files. In Imagemagick i used command -adjoin. In Wand i did the following:
from glob import glob
from wand.image import Image
files = glob('*.jpg')
with Image() as orig: # create empty Image object
for f in files:
page = Image(filename=f)
orig.sequence.append(page)
orig.save(filename='result.pdf')

Categories