How Could I increase the speed - python

I am using below code for an image processing related study. The code works fine as functionality but it is too slow that one step takes up to 10 seconds.
I need faster process speed to reach at the aim.
import numpy
import glob, os
import cv2
import os
input = cv2.imread(path)
def nothing(x): # for trackbar
pass
windowName = "Image"
cv2.namedWindow(windowName)
cv2.createTrackbar("coef", windowName, 0, 25000, nothing)
condition = True
while (condition):
coef = cv2.getTrackbarPos("coef", windowName)
temp_img = input
row = temp_img.shape[0]
col = temp_img.shape[1]
print(coef)
red = []
green = []
for i in range(row):
for y in range(col):
# temp_img[i][y][0] = 0
temp_img[i][y][1] = temp_img[i][y][1]* (coef / 100)
temp_img[i][y][1] = temp_img[i][y][2] * (1 - (coef / 100))
# relative_diff = value_g - value_r
# temp =cv2.resize(temp,(1000,800))
cv2.imshow(windowName, temp_img)
# cv2.imwrite("output2.jpg", temp)
print("fin")
# cv2.waitKey(0)
if cv2.waitKey(30) >= 0:
condition = False
cv2.destroyAllWindows()
Is there anybody have an idea having faster result on the aim?

It's not entirely clear to me what object temp_img is exactly, but if it behaves like a numpy array, you could replace your loop by
temp_img[:,:,0] = temp_img[:,:,1]*(coef/100)
temp_img[:,:,1] = temp_img[:,:,2]*(1-coef/1000)
which should result in a significant speed up if your array is large. The implementation of such operations on arrays are optimised very well, whereas python loops are generally quite slow.
Edit based on comments:
Since you're working with large images and have some expensive operations that need an unscaled version but only need to be executed once, your code could get the following kind of structure
import... #do all your imports
def expensive_operations(image, *args, **kwargs):
#do all your expensive operations like object detection
def scale_image(image, scale):
#create a scaled version of image
def cheap_operations(scaled_image, windowName):
#perform cheap operations, e.g.
coef = cv2.getTrackbarPos("coef", windowName)
temp_img = np.copy(scaled_image)
temp_img[:,:,1] = temp_img[:,:,1]* (coef / 100)
temp_img[:,:,2] = temp_img[:,:,2] * (1 - (coef / 100))
cv2.imshow(windowName, temp_img)
input = cv2.imread(path)
windowName = "Image"
cv2.namedWindow(windowName)
cv2.createTrackbar("coef", windowName, 0, 25000, nothing)
condition = True
expensive_results = expensive_operations(input) #possibly with some more args and keyword args
scaled_image = scale_image(input)
while condition:
cheap_operations(scaled_image, windowName)
if cv2.waitKey(30) >= 0:
condition = False
cv2.destroyAllWindows()

I do this kind of thing in nip2. It's an image processing spreadsheet that can manipulate huge images quickly. It has no problems doing this kind of operation on any size image at 60fps.
I made you an example workspace: http://www.rollthepotato.net/~john/coeff.ws
Here's what it looks like working on a 1gb starfield image:
You can drag the slider to change coeff. The processed image updates instantly as you drag. You can zoom and pan around the processed image to check details and adjust coeff.
The underlying image processing library is libvips, which has a Python binding, pyvips. In pyvips, your program would be:
import pyvips
def adjust(image, coeff):
return image * [1, coeff / 100, 1 - coeff / 100]
Though that's without the GUI elements, of course.

Related

Memory leak in Midas depthmap computation?

I'm using Midas very like in Huggingface's demo.
My issue is that the RAM usage increase at each depth map computation.
Here is the full code.
#!venv/bin/python3
from pathlib import Path
import psutil
import numpy as np
import torch
import cv2
def make_model():
model_type = "DPT_BEiT_L_512" # MiDaS v3.1 - Large
midas = torch.hub.load("intel-isl/MiDaS", model_type)
device = torch.device("cuda")
midas.to(device)
midas.eval()
midas_transforms = torch.hub.load("intel-isl/MiDaS", "transforms")
transform = midas_transforms.dpt_transform
return {"transform": transform,
"device": device,
"midas": midas
}
def inference(cv_image, model):
"""Make the inference."""
transform = model['transform']
device = model["device"]
midas = model["midas"]
input_batch = transform(cv_image).to(device)
with torch.no_grad():
prediction = midas(input_batch)
prediction = torch.nn.functional.interpolate(
prediction.unsqueeze(1),
size=cv_image.shape[:2],
mode="bilinear",
align_corners=False,
).squeeze()
output = prediction.cpu().numpy()
formatted = (output * 255 / np.max(output)).astype('uint8')
return formatted
# Create Midas "DPT_BEiT_L_512" - MiDaS v3.1 - Large
model = make_model()
image_dir = Path('.') / "all_images"
for image_file in image_dir.iterdir():
ram_usage = psutil.virtual_memory()[2]
print("image", ram_usage)
cv_image = cv2.imread(str(image_file))
_ = inference(cv_image, model)
In short:
Create the model "DPT_BEiT_L_512"
Define the function inference
loop over the images in the directory all_images
for each: cv2.imread
compute the depthmap (do not keep the result in memory)
I see that the RAM usage keeps raising over and over.
Variation:
I've tried to read only one image (so only one cv2.imread) and, in the loop, only add a random noise on that image. Up to random noise, the inference function always receive the same image.
In that case, the RAM usage is stable.
QUESTIONS:
Where does the memory leak come from ?
Do I have to "reset" something between two inferences ?
EDIT some variations
variation 1: always the same image
replace the iterdir loop by this:
cv_image = cv2.imread("image.jpg")
for i in range(1, 100):
ram_usage = psutil.virtual_memory()[2]
print(i, ram_usage)
_ = get_depthmap(cv_image, model)
Here you get no memory leak.
Variation 2: do not compute the depth map
for image_file in image_dir.iterdir():
ram_usage = psutil.virtual_memory()[2]
print("image", ram_usage)
cv_image = cv2.imread(str(image_file))
# _ = get_depthmap(cv_image, model)
The memory leak does not occurs.
I deduce that cv2.imread itself does not makes the leak.
Variation 3: same image, random noise:
cv_image = cv2.imread("image.jpg")
for i in range(1, 100):
ram_usage = psutil.virtual_memory()[2]
print(i, ram_usage)
noise = np.random.randn(
cv_image.shape[0], cv_image.shape[1], cv_image.shape[2]) * 20
noisy_img = cv_image + noise
noisy_img = np.clip(noisy_img, 0, 255)
_ = get_depthmap(noisy_img, model)
No leak in this version.

Pixel is there but .getpixel isn't detecting it

I'm currently having an issues with my program that im not too sure how to fix.
I am doing the following:
x = 0
y = 0
im = ImageGrab.grab()
time.sleep(1)
while True:
xy = (x, y)
x = x + 1
if im.getpixel(xy) == (0,158,187):
time.sleep(0.3)
pyautogui.click(x,y)
break
if x >= 1200:
x = 0
y = y + 1
print('cant find pixel')
if y >= 950:
y = 0
x = 0
And it works about 90% of the time and then theres this random time it just says it can't detect the pixel despite the pixel being there 100%.
EDIT: Managed to catch the following error in the 10% it happens:
AttributeError: 'NoneType' object has no attribute 'getpixel'
Which makes no sense since I'm doing im = ImageGrab.grab() beforehand and it works 90% of the time
You should check your ImageGrab() was successful before using the data, so something like:
im = ImageGrab.grab()
if im is not None:
processImage
You'll be there all day if you run double for loops over an image and call a function for every one! Try to get in the habit of using Numpy vectorised code for images in Python.
Basically, you appear to be testing if any pixel in your 1200x950 image matches all three RGB components (0,158,187).
You can do that with Numpy like this:
np.any(np.all(na==(0,158,187), axis=-1))
In the demo below the double for loops take 800ms and the Numpy test takes 20ms, so 40x faster.
#!/usr/bin/env python3
import numpy as np
from PIL import Image
def loopy(im):
for x in range(im.width):
for y in range(im.height):
if im.getpixel((x,y)) == crucialPixel:
return True
return False
def me(im):
# Make image into Numpy array
na = np.array(im)
# Test if there is any pixel where all RGB components match crucialPixel
return np.any(np.all(na==crucialPixel, axis=-1))
# Define our beloved crucial pixel
crucialPixel = (0,158,187)
# Construct a new, solid black image
im = Image.new('RGB', (1200,950))
# Neither should find crucialPixel in black image
result = loopy(im)
result = me(im)
# Insert the crucial pixel
im.putpixel((600,475), crucialPixel)
# Both should find crucialPixel
result = loopy(im)
result = me(im)

When to use multiprocessing.Queue over multiprocessing.Pool? When is there a need to use multiprocessing.Queue?

I have tried multiprocessing.dummy.Pool and multiprocessing.Pool in multiple deep learning projects. I am having a hard time understanding the multiprocessing.Queue, I don't understand its need. Is there a special condition where it is useful.
As an example I have following target function:
def process_detection( det_, dims ,classes):
W = dims[0]
H = dims[1]
classes = classes
boxes = []
confidences=[]
classIDs=[]
classes_pred=[]
for detection in det_:
xcenter, ycenter, width, height = np.asarray([W, H, W, H]) * detection[0:4]
confidence_encoded = detection[5:] # (80,) array
index_class = np.argmax(confidence_encoded) #index of max confidence
confidence = confidence_encoded[index_class] # float value of confidence (probability)
# print(classes)
class_predicted = classes[index_class] # class predicted
if confidence > 0.5:
if class_predicted == "person":
print("{} , {:.2f}".format(class_predicted, confidence))
# continue
topX = int(xcenter - width/2.)
topY = int(ycenter - height/2.)
width = int(width)
height = int(height)
confidence = float(confidence)
bbox = [topX, topY, width, height]
boxes.append(bbox)
confidences.append(confidence)
classIDs.append(index_class)
classes_pred.append(class_predicted)
return [boxes, confidences, classIDs, classes_pred]
I am using multiprocessing.Pool.starmap to process a list of bounding boxes predicted by YOLOv3. The relevant function is below:
def main():
pool = Pool(processes=os.cpu_count()) # make a process pool for multi-processing
path = Path("..")
classes = open(str(path.joinpath("coco.names")), "r").read().strip().split("\n")
colors_array = np.random.randint(0,255,(len(classes),3),dtype="uint8")
colors = {cls_:clr for cls_,clr in zip(classes, colors_array)}
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# reading the video
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
cap = cv2.VideoCapture(str(path.joinpath("video_.mp4")))
_, frame = cap.read()
if frame is None:
print(f"FRAME IS NOT READ")
else:
# frame = resize(frame, width=500)
H, W = frame.shape[0:2]
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# <model>
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
configPath = path.joinpath("yolov3.cfg")
weightsPath = path.joinpath("yolov3.weights")
net = cv2.dnn.readNetFromDarknet(str(configPath), str(weightsPath))
ln = net.getLayerNames()
ln = [ln[i[0] - 1] for i in net.getUnconnectedOutLayers()]
writer = None
boxes = []
confidences = []
classIDs = []
classes_pred = []
fps_ = FPS().start()
i = 0
while True:
# pool = Pool(processes=os.cpu_count()) # make a process pool for multi-processing
try:
if writer is None:
writer = cv2.VideoWriter("./detections.avi", cv2.VideoWriter_fourcc(*"MJPG"), int(cap.get(cv2.CAP_PROP_FPS)), (W, H))
# after this writer will not be none
_, frame = cap.read() # reading the frame
# frame = resize(frame, width=W) # resizing the frame
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416),
swapRB=True, crop=False) # yolov3 version
net.setInput(blob)
start = time()
detections = net.forward(ln)
end = time()
print(f"{(end-start):.2f} seconds taken for detection")
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
# MULTIPROCESSING
# %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
results = pool.starmap_async(process_detection, zip(detections, repeat((W,H)) , repeat(classes) ) )
boxes, confidences, classIDs, classes_pred = results.get()[1]
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
cleaned_indices = cv2.dnn.NMSBoxes(boxes, confidences, 0.5 , 0.3) # 0.3 --> nms threshold
print(f"TOTAL INDICES CLEANED ARE {len(cleaned_indices):00d}")
if len(cleaned_indices)>0:
for cleaned_idx in cleaned_indices.flatten():
BOX = boxes[cleaned_idx]
color = [int(i) for i in colors[classes_pred[cleaned_idx]]]
# print(colors[cleaned_idx], cleaned_idx)
cv2.rectangle(frame, (BOX[0],BOX[1]), (BOX[0]+BOX[2], BOX[1]+BOX[3]),color, 1, cv2.LINE_AA)
text = f"{classes_pred[cleaned_idx]} : {confidences[cleaned_idx]:.2f}"
cv2.putText(frame, text, (BOX[0], BOX[1] - 5), cv2.FONT_HERSHEY_SIMPLEX,
0.5, color, 2)
writer.write(frame)
(pool is closed OUTSIDE while loop).
When is the need to use multiprocessing.Queue?
Can I make this code more efficient using multiprocessing.Queue?
In general it is not necessary (nor useful) to use a Pool and a Queue together.
The way a Pool is most useful is to run the same code with different data in parallel on multiple cores to increase throughput. That is, using the map method and its variants. This is useful for situations where the calculation done on each data-item is independent of all the others.
Mechanisms like Queue and Pipe are for communicating between different processes.
If you need a Queue or a Pipe in a pool worker, then the calculations done by that pool worker are by definition not independent. At best, that reduces the performance of the Pool because the pool workers might have to wait for data to become available. At worst, it might stall the Pool completely if all the workers are busy waiting for data to appear from a Queue.
How to use a Pool
If you expect that all the calculations will take approximately the same time, just use the map method. This will return when all calculations are finished. And the returned values are guaranteed to be in the same order as the submitted data.
(Hint: there is little point in using the _async methods when the next thing you do is to call the get method on the result object.)
If some calculations take (much) longer than others, I would suggest using imap_unordered. This will return an iterator that will start yielding results as soon as they are ready. The results will be in the order that they finished, not in the order they were submitted, so you should add some identifier to the result to enable you to tell to which input data the result belongs.

Python Apply Low Pass Filter On Some Data

I have a .txt file which includes some frame difference of a video.
The project is to remove noise and stabilize a video using these frame differences and a low pass filter.
The Vibrated2.txt file is:
0.341486, -0.258215
0.121945, 1.27605
-0.0811261, 0.78985
-0.0269414, 1.59913
-0.103227, 0.518159
0.274445, 1.69945
, ...
How can i apply a low pass filter on this data?
I tried this but it didn't work!
import cv2
import numpy as np
from scipy.signal import butter, lfilter
video= cv2.VideoCapture('Vibrated2.avi')
freq = (video.get(cv2.CAP_PROP_FPS))
cutoff = 5
data = np.loadtxt('Vibrated2.txt', delimiter=',')
b, a = butter(5, (cutoff/freq), btype='low', analog=False)
data = lfilter(b, a, data)
Any help? Any idea?
I am not exactly sure how your txt file is structured, but if you want to apply a low pass filter on your frame differencing output, i guess you want to make it binary?
def icv_check_threshold(pixel_value, desired_minimum_value):
if pixel_value < desired_minimum_value:
return False
else:
return True
And for the frame differencing:
def icv_pixel_frame_differencing(frame_1, frame_2):
# first convert frames to numpy arrays to make it easier to work with
first_frame = np.asarray(frame_1, dtype=np.float32)
second_frame = np.asarray(frame_2, dtype=np.float32)
# then compute frame dimensions
frame_width = int(first_frame[0].size)
frame_height = int(first_frame.size/frame_width)
# we then create a stock image for differencing output
frame_difference = np.zeros((frame_height, frame_width), np.uint8)
for i in range(0, frame_width - 1):
for j in range(0, frame_height - 1):
# compute the absolute difference between the current frame and first frame
frame_difference[j, i] = abs(first_frame[j, i] - second_frame[j, i])
# check if the threshold = 25 is satisfied, if not set pixel value to 0, else to 255
# comment out code below to obtain result without threshold / non-binary
if icv_check_threshold(frame_difference[j, i]):
frame_difference[j, i] = 255
else:
frame_difference[j, i] = 0
cv2.imwrite("differenceC.jpg", frame_difference)
cv2.imwrite("frame50.jpg", first_frame)
cv2.imwrite("frame51.jpg", second_frame)
return frame_difference
I hope that this helps. Also here is a link to a project with frame differencing I was working on.

Fast RGB Thresholding in Python (possibly some smart OpenCV code?)

I need to do some fast thresholding of a large amount of images, with a specific range for each of the RGB channels, i.e. remove (make black) all R values not in [100;110], all G values not in [80;85] and all B values not in [120;140]
Using the python bindings to OpenCV gives me a fast thresholding, but it thresholds all three RGP channels to a single value:
cv.Threshold(cv_im,cv_im,threshold+5, 100,cv.CV_THRESH_TOZERO_INV)
cv.Threshold(cv_im,cv_im,threshold-5, 100,cv.CV_THRESH_TOZERO)
Alternatively I tried to do it manually by converting the image from PIL to numpy:
arr=np.array(np.asarray(Image.open(filename).convert('RGB')).astype('float'))
for x in range(img.size[1]):
for y in range(img.size[0]):
bla = 0
for j in range(3):
if arr[x,y][j] > threshold2[j] - 5 and arr[x,y][j] < threshold2[j] + 5 :
bla += 1
if bla == 3:
arr[x,y][0] = arr[x,y][1] = arr[x,y][2] = 200
else:
arr[x,y][0] = arr[x,y][1] = arr[x,y][2] = 0
While this works as intended, it is horribly slow!
Any ideas as to how I can get a fast implementation of this?
Many thanks in advance,
Bjarke
I think the inRange opencv method is what you are interested in. It will let you set multiple thresholds simultaneously.
So, with your example you would use
# Remember -> OpenCV stores things in BGR order
lowerBound = cv.Scalar(120, 80, 100);
upperBound = cv.Scalar(140, 85, 110);
# this gives you the mask for those in the ranges you specified,
# but you want the inverse, so we'll add bitwise_not...
cv.InRange(cv_im, lowerBound, upperBound, cv_rgb_thresh);
cv.Not(cv_rgb_thresh, cv_rgb_thresh);
Hope that helps!
You can do it with numpy in a much faster way if you don't use loops.
Here's what I came up with:
def better_way():
img = Image.open("rainbow.jpg").convert('RGB')
arr = np.array(np.asarray(img))
R = [(90,130),(60,150),(50,210)]
red_range = np.logical_and(R[0][0] < arr[:,:,0], arr[:,:,0] < R[0][1])
green_range = np.logical_and(R[1][0] < arr[:,:,0], arr[:,:,0] < R[1][1])
blue_range = np.logical_and(R[2][0] < arr[:,:,0], arr[:,:,0] < R[2][1])
valid_range = np.logical_and(red_range, green_range, blue_range)
arr[valid_range] = 200
arr[np.logical_not(valid_range)] = 0
outim = Image.fromarray(arr)
outim.save("rainbowout.jpg")
import timeit
t = timeit.Timer("your_way()", "from __main__ import your_way")
print t.timeit(number=1)
t = timeit.Timer("better_way()", "from __main__ import better_way")
print t.timeit(number=1)
The omitted your_way function was a slightly modified version of your code above. This way runs much faster:
$ python pyrgbrange.py
10.8999910355
0.0717720985413
That's 10.9 seconds vs. 0.07 seconds.
The PIL point function takes a table of 256 values for each band of the image and uses it as a mapping table. It should be pretty fast. Here's how you would apply it in this case:
def mask(low, high):
return [x if low <= x <= high else 0 for x in range(0, 256)]
img = img.point(mask(100,110)+mask(80,85)+mask(120,140))
Edit: The above doesn't produce the same output as your numpy example; I followed the description rather than the code. Here's an update:
def mask(low, high):
return [255 if low <= x <= high else 0 for x in range(0, 256)]
img = img.point(mask(100,110)+mask(80,85)+mask(120,140)).convert('L').point([0]*255+[200]).convert('RGB')
This does a few conversions on the image, making copies in the process, but it should still be faster than operating on individual pixels.
If you stick to using OpenCV, then just cv.Split the image into multiple channels first and then cv.Threshold each channel individually. I'd use something like this (untested):
# Temporary images for each color channel
b = cv.CreateImage(cv.GetSize(orig), orig.depth, 1)
g = cv.CloneImage(b)
r = cv.CloneImage(b)
cv.Split(orig, b, g, r, None)
# Threshold each channel using individual lo and hi thresholds
channels = [ b, g, r ]
thresh = [ (B_LO, B_HI), (G_LO, G_HI), (R_LO, R_HI) ]
for c, (lo, hi) in zip(channels, thresh):
cv.Threshold(ch, ch, hi, 100, cv.CV_THRESH_TOZERO_INV)
cv.Threshold(ch, ch, lo, 100, cv.CV_THRESH_TOZERO)
# Compose a new RGB image from the thresholded channels (if you need it)
dst = cv.CloneImage(orig)
cv.Merge(b, g, r, None, dst)
If your images are all the same size, then you can re-use the created images to save time.

Categories