To train the image classification model I'm loading input data as NumPy array, I deal with thousands of images. Currently, I'm looping through each image and converting it into a NumPy array as shown below.
import glob
import cv2
import numpy as np
tem_arr_list = []
from time import time
images_list = glob.glob(r'C:\Datasets\catvsdogs\cat\*.jpg')
start = time()
for idx, image_path in enumerate(images_list):
start = time()
img = cv2.imread(image_path)
temp_arr = np.array(cv2.imread(image_path))
# print(temp_arr.shape)
tem_arr_list.append(temp_arr)
print("Total time taken {}".format (time() - start))
running this method takes a lot of time when data is huge. So I tried using list comprehension as below
tem_arr_list = [np.array(cv2.imread(image_path)) for image_path in images_list]
which is slight quicker than looping but not fastest
I'm looking any other way to reduce the time to do this operation . Any help or suggestion on this will be appreciated.
Use the multiprocessing pool to load data parallely. In my PC the cpus count is 16. I tried loading 100 images and below you could see the time taken.
import multiprocessing
import cv2
import glob
from time import time
def load_image(image_path):
return cv2.imread(image_path)
if __name__ == '__main__':
image_path_list = glob.glob('*.png')
try:
cpus = multiprocessing.cpu_count()
except NotImplementedError:
cpus = 2 # arbitrary default
pool = multiprocessing.Pool(processes=cpus)
start = time()
images = pool.map(load_image, image_path_list)
print("Total time taken using multiprocessing pool {} seconds".format (time() - start))
images = []
start = time()
for image_path in image_path_list:
images.append(load_image(image_path))
print("Total time taken using for loop {} seconds".format (time() - start))
start = time()
images = [load_image(image_path) for image_path in image_path_list]
print("Total time taken using list comprehension {} seconds".format (time() - start))
Output:
Total time taken using multiprocessing pool 0.2922379970550537 seconds
Total time taken using for loop 1.4935636520385742 seconds
Total time taken using list comprehension 1.4925990104675293 seconds
Related
I have several binary images (geotiff) that i want to perform binary closing on.
Because there are atleast 2600 images, i want to shorten the execution time for each operation.
I'm using Python and have found several packages which include this operation. Among these are scipy and scikit-image:
from scipy.ndimage import binary_closing
from scipy.ndimage import grey_closing
import skimage.morphology as sm
....
arr = some_binary_image_as_numpy_array
start = time.time()
disk = sm.disk(5)
opened_arr = sm.binary_closing(arr, disk )
end = time.time()
print(end - start)
start = time.time()
opened_arr = binary_closing(arr, structure=np.ones((10, 10)) )
end = time.time()
print(end - start)
output:
8.373806715011597
9.53745985031128
I know the scikit-image disk isnt the same as the np.ones ((10,10)), so they are not completely the same.
However, when i run scipy's greyscale closing (grey_closing) i get this result:
start = time.time()
opened_arr = grey_closing(arr, size=(10, 10))
end = time.time()
print(end - start)
output:
4.35299277305603
On 10 out of 11 images i tried, the number of pixels with 1 in grey_closing and binary_closing were the same.
I know scipy's grey_closing also has the argument structure, but when i use this, the execution time increased to 30 seconds.
So my question is: would scipy's grey_closing using the argument size, be the same as
using scipy's binary_closing with argument structure?
I would like to use scipy.fftpack.fft (and rfft) inside a multiprocessing structure,
I have observed significant performances losses due to an apparent incompatibility between scipy.fftpack and multiprocessing, which makes the parallelization almost inefficient.
Unless the issue seems well known, I could not find a solution in the web to avoid this performance losses.
Below is an minimalist example showing the issue :
import time
import multiprocessing as mp
from scipy.fftpack import fft, ifft
import numpy as np
def costly_function(n_mean: int):
start = time.time()
x = np.ones(16385, dtype=float)
for n in range(n_mean):
fft(ifft(x))
return (time.time() - start) * 1000.
n_run = 24
# ===== sequential test
sequential_times = [costly_function(500) for _ in range(n_run)]
print(f"time per run (sequential): {np.mean(sequential_times):.2f}+-{np.std(sequential_times):.2f}ms")
# ===== parallel test
with mp.Pool(12) as pool:
parallel_times = pool.map(costly_function, [500 for _ in range(n_run)])
print(f"time per run (parallel): {np.mean(parallel_times):.2f}+-{np.std(parallel_times):.2f}ms")
On a 12 cores machine under Ubuntu and python 3.10, I get the following result :
>> time per run (sequential): 510.55+-64.64ms
>> time per run (parallel): 1254.26+-114.47ms
note : none of these additions could resolve the problem
import os
os.environ['OPENBLAS_NUM_THREADS'] = '1'
os.environ['MKL_NUM_THREADS'] = '1'
os.environ['OMP_NUM_THREADS'] = '1'
os.environ['NUMEXPR_NUM_THREADS'] = '1'
I have a function to create image difference hash and stored in a list in Python:
import glob
import dhash
from alive_progress import alive_bar
from wand.image import Image
def get_photo_hashes(dir='./'):
file_list = {}
imgs = glob.glob(dir + '*.jpg')
total = len(list(imgs))
with alive_bar(total) as bar:
for i in imgs:
with Image(filename=i) as image:
row, col = dhash.dhash_row_col(image)
hash_val = dhash.format_hex(row, col)
file_list[i] = hash_val
bar()
return file_list
The performance of hashing a folder of 10,000 500kb - 1MB JPEG images are surprisingly slow, around 2 hashes per second. How can I enhance the performance of this function? Thanks.
Multiprocessing would ideal for this as the hashing is CPU intensive.
Here's what I suggest:
import dhash
import glob
from wand.image import Image
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
PATH = '*.jpg'
def makehash(t):
filename, d = t
with Image(filename=filename) as image:
row, col = dhash.dhash_row_col(image)
d[filename] = dhash.format_hex(row, col)
def main():
with Manager() as manager:
d = manager.dict()
with ProcessPoolExecutor() as executor:
executor.map(makehash, [(jpg, d) for jpg in glob.glob(PATH)])
print(d)
if __name__ == '__main__':
main()
Some stats:
I have a folder containing 129 JPGs. The average size of each file is >12MB. The net processing time is ~19s
I like #JCaesar answer a lot, and decided to have a play with it. I created 1,000 JPEGs of around 500kB each with:
parallel magick -size 640x480 xc: +noise random {}.jpg ::: {1..1000}
Then I tried the code from his answer using Wand, and got 21.3s for 1000 images. I then switched to PIL, and using the same images, the time dropped to 9.6s. I then had a think and realised that the Perceptual hash algorithm converts the image to greyscale and shrinks it to 8x8 pixels - and that the JPEG library has a "shrink-on-load" feature which you can use in PIL if you call Image.draft(newMode,newSize). That reduces the time to load and also the amount of I/O needed. Enabling that feature further reduces the time to 6s for the same images. The code looks like this:
#!/usr/bin/env python3
# https://stackoverflow.com/a/70709538/2836621
import dhash
import glob
from PIL import Image
from concurrent.futures import ProcessPoolExecutor
from multiprocessing import Manager
PATH = '*.jpg'
def makehash(t):
filename, d = t
with Image.open(filename) as image:
image.draft('L', (32,32))
row, col = dhash.dhash_row_col(image)
d[filename] = dhash.format_hex(row, col)
def main():
with Manager() as manager:
d = manager.dict()
with ProcessPoolExecutor() as executor:
executor.map(makehash, [(jpg, d) for jpg in glob.glob(PATH)])
print(d)
if __name__ == '__main__':
main()
I have a set of large tiff images (3200 x 3200). I want to load them in jupyter notebook, compute averages and create a numpy array. To make it concrete, each movie clip is composed of 10 tiff images and there are several movie clips. I want to compute the average of 10 frames for each movie, and return an array of size (# movie clips x 3200 x 3200).
from tifffile import tifffile
def uselibtiff(f):
tif = tifffile.imread(f) # open tiff file in read mode
return tif
def avgFrames(imgfile_Movie,im_shape):
# calculating the sum of all frames of a movie. imgfile_Movie is a list of tiff image files for a particular movie
im = np.zeros(im_shape)
for img_frame in imgfile_Movie:
frame_im = uselibtiff(img_frame)
im += frame_im
return im
# imgfiles_byMovie is a 2D list
im_avg_all = np.zeros((num_movies,im_shape[0],im_shape[1]))
for n,imgfile_Movie in enumerate(imgfiles_byMovie):
print(n)
#start = timeit.default_timer()
results = avgFrames(imgfile_Movie,im_shape)/num_frames
#end = timeit.default_timer()
#print(end-start)
im_avg_all[n] = results
so far I've noticed that avgFrames takes about 3 seconds (for 10 frames per movie). I wanted to speed things up and tried using multiprocessing in python.
import multiprocessing as mp
print("Number of processors: ", mp.cpu_count())
pool = mp.Pool(mp.cpu_count())
results = [pool.apply(avgFrames, args=(imgfile_Movie, im_shape)) for imgfile_Movie in imgfiles_byMovie]
pool.close()
Unfortunately, the above code runs very slow and I have to terminate it without seeing the results. What am I doing wrong here?
I am performing DCT(in Raspberry Pi). I've broken the image into 8x8 blocks. Initially I performed DCT in nested for loop (without multithreading). I observed that it takes about 18 seconds for a 512x512 image.
But, Here's the code with multi-threads
#!/usr/bin/env python
from __future__ import print_function,division
import time
start_time = time.time()
import cv2
import numpy as np
import sys
import pylab as plt
import threading
import Queue
from numpy import empty,arange,exp,real,imag,pi
from numpy.fft import rfft,irfft
from pprint import pprint
queue = Queue.Queue()
if len(sys.argv)>1:
im = cv2.imread(sys.argv[1])
else :
im = cv2.imread('baboon.jpg')
im = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
h, w = im.shape[:2]
DF = np.zeros((h,w))
Nb=8
def dct2(y):
M = y.shape[0]
N = y.shape[1]
a = empty([M,N],float)
b = empty([M,N],float)
for i in range(M):
a[i,:] = dct(y[i,:])
for j in range(N):
b[:,j] = dct(a[:,j])
queue.put(b)
def dct(y):
N = len(y)
y2 = empty(2*N,float)
y2[:N] = y[:]
y2[N:] = y[::-1]
c = rfft(y2)
phi = exp(-1j*pi*arange(N)/(2*N))
return real(phi*c[:N])
def Main():
jobs = []
for row in range(0, h, Nb):
for col in range(0, w, Nb):
f = im[(row):(row+Nb), (col):(col+Nb)]
thread = threading.Thread(target=dct2(f))
jobs.append(thread)
df = queue.get()
DF[row:row+Nb, col:col+Nb] = df
for j in jobs:
j.start()
for j in jobs:
j.join()
if __name__ == "__main__":
Main()
cv2.imwrite('dct_img.jpg', DF)
print("--- %s seconds ---" % (time.time() - start_time))
plt.imshow(DF1, cmap = 'Greys')
plt.show()
cv2.waitKey(0)
cv2.destroyAllWindows()
After using multiple threads, this code take about 25 seconds to get executed. What's wrong? Have I implemented multi-threading wrongly? I want to reduce the time taken to perform DCT as much as possible (1-5 seconds). Any suggestions?
Any other concept or method (I've read post on multiprocessing) that'll significantly reduce my execution and processing time?
Due to GIL all your threads are executed in a sequence (not in parallel).
So you might want to switch to multiprocessing. Another option is to build numba, which can greatly increase speed of usual python code and also can unlock GIL.
In Python, you should use multithreading for performances only when mixing IO and CPU tasks.
For your problem you should use multiprocessing.
Maybe the other posters are right about the GIL. But OpenCV as well as Numpy release the GIL so I would at least expect a speedup from a multithreaded solution.
I would have a look at how many threads you are creating simultaneously. It's probably a lot since you start one for each 8 by 8 pixel sub picture. (Each time a thread is taken off the cpu and replaced by another it incurs a small overhead which in sum gets quite noticeable if you have a lot of threads)
If this is the case you probably gain performance by not starting them all at once but to only start as many as you have cpu cores (a few more a few less...just experiment) and only start the next thread if one has finished.
Look at the answers to this question on how to do this with minimal effort.