I'm tryting to recognize lego bricks from video cam using opencv. It performs extremely bad comparing with just running detect.py in Yolov5. Thus I made some experiments about just recognizing images, and I found using openCV still performs dramatically bad as well, is there any clue? Here are the experiments I did.
This is the result from detect.py by just running
python detect.py --weights runs/train/yolo/weights/best.pt --source legos.jpg
This is the result from openCV by implementing this
import torch
import cv2
import numpy as np
model = torch.hub.load('.', 'custom', path='runs/train/yolo/weights/last.pt', source='local')
cap = cv2.VideoCapture('legos.jpg')
while cap.isOpened():
ret, frame = cap.read()
# Make detections
results = model(frame)
cv2.imshow('YOLO', np.squeeze(results.render()))
if cv2.waitKey(0) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
If I simply do this, it gives a pretty good result
import torch
results = model('legos.jpg')
results.show()
Any genious ideas?
Probably your model is trained with RGB images while opencv is using BGR format. Please try to convert the colour space accordingly. Example:
import torch
import cv2
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')
# read image and convert to RGB
img = cv2.imread('zidane.jpg')
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
# make detections
results = model(img)
# render results and convert back to BGR
results.render()
out = cv2.cvtColor(img, cv2.COLOR_RGB2BGR)
cv2.imshow('YOLO', out)
cv2.waitKey(-1)
cv2.destroyAllWindows()
Related
I am trying to run code to show the video output of my webcam and all I am getting is a single picture. Here is my code:
import cv2
captureDevice = cv2.VideoCapture(0)
while True:
check, frame=captureDevice.read()
print(check)
print(frame)
gray=cv2.cvtColor(gray, cv2.COLOR_BGR2GRAY)
cv2.imshow('Capturing', gray)
key=cv2.waitKey(1)
if key==ord('q'):
break
captureDevice.release()
cv2.destroyAllWindows()
gray=cv2.cvtColor(gray, cv2.COLOR_BGR2GRAY)
you are using a variable gray you would have defined elsewhere. Ideally this should throw error. (Try restarting your notebook and you will see)
Change it to:
gray=cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
I used Zbar and OpenCV to read the QR code in the image below but both failed to detect it. For ZBar, I use pyzbar library as the python wrapper. There are images that QR is detected correctly and images really similar to the successful ones that fail. My phone camera can read the QR code in the uploaded image which means it is a valid one. Below is the code snippet:
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
import cv2
# zbar
results = decode(cv2.imread(image_path), symbols=[ZBarSymbol.QRCODE])
print(results)
# opencv
qr_decoder = cv2.QRCodeDetector()
data, bbox, rectified_image = qr_decoder.detectAndDecode(cv2.imread(image_path))
print(data, bbox)
What type of pre-processing will help to increase the rate of success for detecting QR codes?
zbar, which does some preprocessing, does not detect the QR code, which you can test running zbarimg image.jpg.
Good binarization is useful here. I got this to work using the kraken.binarization.nlbin() function of the Kraken library. The library is for OCR, but works very well for QR codes, too, by using non-linear processing. The Kraken binarization code is here.
Here is the code for the sample:
from kraken import binarization
from PIL import Image
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "image.jpg"
# binarization using kraken
im = Image.open(image_path)
bw_im = binarization.nlbin(im)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=361, width=300, height=306), polygon=[Point(x=1429, y=361), Point(x=1429, y=667), Point(x=1729, y=667), Point(x=1723, y=365)])]
The following picture shows the clear image of the QR code after binarization:
I had a similar issue, and Seanpue's answer got me on the right track for this problem. Since I was already using the OpenCV library for image processing rather than PIL, I used it to perform Otsu's Binarization using the directions in an OpenCV tutorial on Image Thresholding. Here's my code:
import cv2
from pyzbar.pyzbar import decode
from pyzbar.pyzbar import ZBarSymbol
image_path = "qr.jpg"
# preprocessing using opencv
im = cv2.imread(image_path, cv2.IMREAD_GRAYSCALE)
blur = cv2.GaussianBlur(im, (5, 5), 0)
ret, bw_im = cv2.threshold(blur, 0, 255, cv2.THRESH_BINARY+cv2.THRESH_OTSU)
# zbar
decode(bw_im, symbols=[ZBarSymbol.QRCODE])
[Decoded(data=b'DE-AAA002065', type='QRCODE', rect=Rect(left=1429, top=362, width=300, height=305), polygon=[Point(x=1429, y=362), Point(x=1430, y=667), Point(x=1729, y=667), Point(x=1724, y=366)])]
Applying the gaussian blur is supposed to remove noise from the picture to make the binarization more effective, but for my application it didn't actually make much difference. What was vital was to convert the image to grayscale to make the threshold function work (done here by opening the file with the cv2.IMREAD_GRAYSCALE flag).
QReader use to work quite well for these cases.
from qreader import QReader
import cv2
if __name__ == '__main__':
# Initialize QReader
detector = QReader()
img = cv2.cvtColor(cv2.imread('92iKG.jpg'), cv2.COLOR_BGR2RGB)
# Detect and Decode the QR
print(detector.detect_and_decode(image=img))
This code output for this QR:
DE-AAA002065
I am just starting out on python and I am attempting to create a code that does real-time OCR on a portion of my screen. I was certain this code would work, but it just throws me a bunch of tesseract errors. Does the image need to be saved for Tesseract to work? Is there a better OCR library for this task? The OpenCV part works perfectly and displays the image.
import numpy as np
import cv2
from PIL import ImageGrab
import pytesseract
while True:
orig_img = ImageGrab.grab(box)
np_im = np.array(orig_img)
img = cv2.cvtColor(np_im, cv2.COLOR_BGR2GRAY)
text = pytesseract.image_to_string(img)
cv2.imshow('window',img)
if cv2.waitKey(25) & 0xFF == ord('q'):
cv2.destroyAllWindows()
print(text)
I fixed it. I was not aware that I needed to install tesseract in my pc. I also added
im = Image.fromarray(img)
im.save("img.png")
to save the image
Not sure what is the problem. Code is running fine and not showing any error. But Not seeing any output.
import numpy as np
import cv2
cap = cv2.VideoCapture("https://youtu.be/_3elg-_1m_c")
while(cap.isOpened()):
ret, frame = cap.read()
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
cv2.imshow('frame',gray)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
If, I am using video saved in my laptop, I can see the output, but for online vidoes, I am not.
I am using the code below, but I get a black image. Could you please help me rectify the error?
import cv2
import numpy as np
c = cv2.VideoCapture(0)
while(1):
_,f = c.read()
cv2.imshow('e2',f)
if cv2.waitKey(5)==27:
break
cv2.destroyAllWindows()
Update: See github.com/opencv/opencv/pull/11880 and linked conversations, only few backends support -1 as index.
Although this is an old post, this answer can help people who are still facing the same problem. If you have a single webcam but it renders all black, use cv2.VideoCapture(-1). This will get you the working camera.
Just change cv2.waitKey(0) to cv2.waitKey(30) and this issue will be resolved.
I've faced with same problem. Updating neither opencv nor webcam driver works. I am using kaspersky as antivirus. When I disable the kaspersky, then black output problem solved.
BTW, I can see the running .py file in kaspersky console > reports > host intrusion prevention. It reports application privilege control rule triggered - application: myfile.py, result: blocked: access to video capturing devices
Try this:
import cv2
import numpy as np
cap = cv2.VideoCapture(0)
while(True):
ret, frame = cap.read()
cv2.imshow('frame',frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
This worked for me:
I did a pip install imutils. Imutils is a library with series of convenience functions to make basic image processing functions such as translation, rotation, resizing, skeletonization, displaying Matplotlib images, sorting contours, detecting edges, and much more easier with OpenCV and both Python 2.7 and Python 3.
import cv2
import imutils
cap = cv2.VideoCapture(0) # video capture source camera (Here webcam of laptop)
ret, frame = cap.read() # return a single frame in variable `frame`
while (True):
# gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
(grabbed, frame) = cap.read()
frame = imutils.resize(frame, width=400)
cv2.imshow('img1', frame) # display the captured image
if cv2.waitKey(1) & 0xFF == ord('q'): # save on pressing 'y'
cv2.imwrite('capture.png', frame)
cv2.destroyAllWindows()
break
cap.release()
Try put -0 on the index and pause any antivirus running
import cv2
import numpy as np
cap = cv2.VideoCapture(-0)
cap.set(3,640)
cap.set(3,480)
while(True):
success, img = cap.read()
cv2.imshow('frame',img)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
I faced the same issue after many calls with:
cap = cv2.VideoCapture(0)
and it solved when I changed the index to 1 :
cap = cv2.VideoCapture(1)
In my case just disabling Kaspersy has solved the problem.