Detecting a smaller image (.png) within a larger image retrieved with ImageGrab? - python

I've spent the past hour researching this simple topic, but all of the answers I've come across have been very complex and, as a noob to Python, I've been unable to incorporate any of them into my program.
I am trying to make an AI play a browser version of the Piano Tiles Game. As of now, I'm simply trying to take a capture of the games window (a small portion of my computer screen), and then check that games window with a .png of the game's start button. From there I will go on to CLICK that start button, but that's a problem for another time.
How can I check to see if a Image contains a .png file?
Here is my current code:
from PIL import ImageGrab as ig, ImageOps as io, Image
import pyautogui
import bbox
def grabStart(window):
#The start button
start = Image.open("res/StartButton.PNG")
start = io.grayscale(start)
#This is the part I need to figure out. The following is just pseudocode
if window.contains(start): #I know that this doesn't actually work. Just pseudocode
#I'd like to return the location of 'start' in one of the following forms
return either: (x1, y1, x2, y2), (x1, y1, width, height), (a coordinate within 'start'))
def grabGame():
#The coordinates of the entire game window
x1 = 2222
y1 = 320
x2 = 2850
y2 = 1105
#The entire screen of the game
window = ig.grab(bbox = (x1, y1, x2, y2))
window = io.grayscale(window)
return window
grabStart(grabGame())

Try using pyautogui.locate(). Function takes in input two parameter's, the first is the image which needs to be found, the second one is the image in which the smaller image needs to be found. This method only works for images, so if you want to run this for a live window, you might consider another option. Secondly pyautogui is just a wrapper over PIL so if you run into efficiency issues, you might wanna translate the locate() into its PIL equivalent for performance.

Here's a way of doing it. I just leave the program running and open/close and move a preview of the button around the screen seeing if it spots the button and reports the coordinates correctly.
#!/usr/bin/env python3
from PIL import ImageGrab as ig, Image
import pyautogui as ag
def checkButton(button, window):
try:
location = ag.locate(button, window, confidence=0.8)
print(f'location: {location[0]},{location[1]},{location[2]},{location[3]}')
except:
print('Not found')
# Load button just once at startup
button = Image.open("button.png")
# Loop, looking for button
while True:
window = ig.grab()
checkButton(button, window)

Related

pyautogui live image locator

i've got a problem in python :
import pyautogui as a
while True:
pixel = a.locateOnScreen("example.png")
if pixel == None: continue
pixel = a.center(pixel)
data = [pixel.x , pixel.y]
a.moveTo(data[0],data[1])
in this code it finds the picture and move the mouse on it but its to slow because every time loop start from beginning it loads the file and it makes it too slow.
i want it to work lively.
i tried:
import pyautogui as a
from IPython.display import Image
f = Image("example.png")
while True:
pixel = a.locateOnScreen(f)
pixel = a.center(pixel)
data = [pixel.x , pixel.y]
a.moveTo(data[0],data[1])
but it says image don't have attribute named mode
I want to place pointer on the center of picture
You use IPython.display.Image object, but that is the wrong one. Use PIL.Image instead:
import pyautogui as a
from PIL import Image
f = Image.open("example.png")
while True:
pixel = a.locateOnScreen(f)
pixel = a.center(pixel)
data = [pixel.x , pixel.y]
a.moveTo(data[0],data[1])
I confirmed this works with pyautogui version 0.9.53
That said, I don't think that the loading of the image is your performance bottleneck here, but rather the search algorithm itself. Pyautogui's documentation states that locateOnScreen can take some time. You can try to search a smaller region on the screen. Citing:
These “locate” functions are fairly expensive; they can take a full
second to run. The best way to speed them up is to pass a region
argument (a 4-integer tuple of (left, top, width, height)) to only
search a smaller region of the screen instead of the full screen:
import pyautogui
pyautogui.locateOnScreen('someButton.png', region=(0,0, 300, 400))

Display continuously change of an image in python

I am writing a python program that gradually changes an image step by step, adjusting each pixel by a small amount in each step. To get a visualization of what the program is doing during runtime, I want it to display the image at each step, always overwriting the currently shown image so that it doesen't open bunch of display windows.
I already tried matplotlib, opencv and skimage, with their according possibilities to display an image and update the frame content in the course of the program:
# using scimage
viewer = ImageViewer(image)
viewer.show(main_window=False) # set the parameter to false so that the code doesn't block here but continues computation
..other code..
viewer.update_image(new_image)
# using matplotlib
myplot = plt.imshow(image)
plt.show(block=False)
.. other code..
myplot.set_data(new_image)
plt.show()
# using opencv
cv2.imshow('image',image)
.. other code ..
cv2.imshow('image', new_image)
I always ran into the problem that when it was supposed to open a frame with an image, it did not display the image but only a black screen. Weirdly enough, when I ran the code in IntelliJ in debug-mode and hit a breakpoint after the display-function, it worked.
What can I do so that it is displayed correctly when running the program normally and not with a breakpoint?
Here's the thing, I think your program does work, except it does and finishes unless you tell it to pause, which is why your breakpoint strategy is working.
Try pausing after showing image -
You can ask for user input. It'll pause until you enter some input to the program.
Put the program thread to sleep for some specified amount of time. This'll freeze your program for some given specified time, but you'll be able to see the image if it's already rendered.
Edit -
Since opencv's waitKey method is working for you now, you can use this method again to prevent the program from closing image window. Use waitKey(0) as your last program statement. It waits for a key press indefinitely, and returns the pressed key's code. Press any key to continue (but remember to have your image window in focus or it won't work), and your program should close if it's used in the end.
Also, I've striked earlier suggested options for pausing a program, because I'm unsure if it would've helped. I think waitKey method is more complex, and helps pause the program without freezing it.
Well, I am still not sure what exactly your goal is but here is a code piece that modifies an image inside of a window whenever the upper button is pressed.
from tkinter import Tk, Canvas, Button, Label, PhotoImage, mainloop
import random
WIDTH, HEIGHT = 800, 600
def modify_image():
print ("modifiying image...")
for x in range(1000):
img.put ( '#%06x' % random.randint(0, 16777215), # 6 char hex color
( random.randint(0, WIDTH), random.randint(0, HEIGHT) ) # (x, y)
)
canvas.update_idletasks()
print ("done")
canvas = Canvas(Tk(), width=WIDTH, height=HEIGHT, bg="#000000")
canvas.pack()
Button(canvas,text="modifiying image",command=modify_image).pack()
img = PhotoImage(width=WIDTH, height=HEIGHT)
Label(canvas,image=img).pack()
mainloop()
The function modify_image() adds 1000 random pixels to the image within the main window. Note the tkinter module is a default python module.

Clicking on pixel for certain process with python

Hello I am programming a program to automate mouse presses on a program for certain pixels but I don't want a second program to come in the way with that click, my program is going to look for a green pixel and click it on a certain part of the screen, but if there is another program/image in the way that is green I don't want it to click on that
I just want it to click on the process/program I want it to click on, and not click on the screen
If anyone could give me some tips on this, that would be helpful
To get the focused window (do not click if this is focused) use:
from win32gui import GetWindowText, GetForegroundWindow
print(GetWindowText(GetForegroundWindow()))
Do this on your windows and then do if statements to stop clicking on pixels.
But to click on the pixels you can use win32con and win32api:
import win32api, win32con
def click(x,y):
win32api.SetCursorPos((x,y))
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTDOWN,0,0)
sleep(0.01)
win32api.mouse_event(win32con.MOUSEEVENTF_LEFTUP,0,0)
To get the pixels and click on them use PyAutoGui
import pyautogui
from pyautogui import *
width = 1920
hight = 1080
while WindowIsFocused:
pic = pyautogui.screenshot()
for x in range(0, width, 1):
for y in range(0, hight, 1):
r, g, b = pic.getpixel((x, y))
if r == 252:
if g == 200:
if b == 118:
click(x,y)
print("Clicked")
Set width and hight to your screen resolution and WindowIsFocused to True if the window you want is focused. Use an extra function for that (the function should run constantly).
I hope that I could help. For any questions ask me. :)
Im sorry im not familiar with keys interacting with programs, but i made a little research and found a library called PyWin32 that should statisfy your need. You can search for its documentation or try your luck by finding videos on this particullar library on youtube.
Anyways, hopefully this helped you getting set in the right direction, and feel free to ask any question

How do I open .gif file using tkinter without getting error "Too early to create image"?

I cannot start my Python program. I've a problem that I cannot open a .gif file, and I cannot figure out how!
I keep getting a long error message:
"RuntimeError: Too early to create image"
I have moved the gif files into the same project file as the code, and I tried looking online, but everyone uses different packages, and I just cannot find a way around it. I also have the gifs open on pycharm.
Here is my code:
import random
from tkinter import *
sign = random.randint(0, 1)
if (sign == 1):
photo = PhotoImage(file="X.gif")
else:
photo = PhotoImage(file="O.gif")
My overall goal is to show an image like a finished tic tac toe game, with randomly placed X's and O's, and there does not have to be any specific order like 3 in a row. Here is the homework problem:
Display a frame that contains nine labels. A label may display an image icon for X or an image icon for O, as shown in Figure 12.27c. What to display is randomly decided.
Use the Math.random() method to generate an integer 0 or 1, which corresponds to displaying an X or O image icon. These images are in the files x.gif and o.gif.
I can see from the code that you're using PhotoImage before creating a main window gives you an Runtime error and it is clearly said in the error that "Too early to create image" means the image cannot be create if there is no active Tk window.
The reason why some people prefer the use other module because it give you more flexibility to resize, reshape, invert and more. ( By the way it could Pillow module from PIL import Image, ImageTk How to use PIL in Tkinter ).
Now back to your code.
You can randomise "O" and "X" images without even use of if-else.
I created main window before creating the Image.
Make sure the images you using are in the same directory.
import random
from tkinter import *
sign = random.choice( ["X.gif", "O.gif"] )
print(sign,"photo has been selected")
root = Tk()
Photo = PhotoImage(file=sign)
display_photo = Label(root, image=Photo)
display_photo.pack()
mainloop()

Get global mouse position and use in python program?

As part of a larger project, I am trying to create a snapshot tool that works similar to the Mac OS X snapshot. It should take a first click, a second click, and return an image of the area created by the square.
I have some python functions that take an first point (x, y) and a second point (x, y) and create a snapshot of the square that those points create on the screenshot. The missing piece is getting the mouse locations of the initial click and second click, then passing that data to the python program to create the snapshot.
In other words, the flow of the program should be:
first click (save x, y)
second click (save x2, y2)
run snapshot.py using the saved clicked data to return the screenshot
I've only found solutions that can return the position of the pointer within a frame. If it helps, I'm using "import gtk" and "from Xlib import display"
edit: I have tried to use Tkinter to make an invisible frame that covers the whole screen. The idea was to use that invisible frame to get the exact coordinates of two mouse clicks, and then the invisible frame would disappear, pass the coordinates on to the screenshot function, and it would be done. However, the code I've been writing doesn't keep the frame transparent.
edit 2: This code can create a window, make it transparent, size it to the screen, then return the mouse coordinates on that window. I can use this to simply return the mouse coordinates on two clicks, then remove the window and send those coordinates to the snapshot code. When I run the below code line-by-line in the python shell, it works perfectly. However, whenever I run the code as a whole, it seems to skip the part where it makes the window transparent. Even if I copy and paste a block of code that includes the 'attributes("-alpha", 0.1)' into the python shell, it ignores that line.
from Tkinter import *
root = Tk()
root.attributes('-alpha', 0.1)
maxW = root.winfo_screenwidth()
maxH = root.winfo_screenheight()
root.geometry("{0}x{1}+0+0".format(maxW, maxH))
def callback(event):
print "clicked at: ", event.x, "and: ", event.y
root.bind("<Button-1>", callback)
def Exit(event):
root.destroy()
root.bind("<Escape>", Exit)
# root.overrideredirect(True)
root.mainloop()
I am open to using any c or c++ code, or any language's code, to return the coordinates of the mouse on a click. This guy wrote some code to actually make the computer click at given points, which may be on the same track as my problem.
It's just a indentation problem - you bound the callback in the callback by mistake - try this instead:
root.geometry("{0}x{1}+0+0".format(maxW, maxH))
def callback(event):
print "clicked at: ", event.x, "and: ", event.y
root.bind("<Button-1>", callback)
EDIT
Ok, here's a theory - maybe when you run it from the command line, it takes longer for the root window to appear, for some reason, so you set the alpha before it exists, and the alpha option gets ignored. Give this a try:
root.wait_visibility(root)
root.attributes('-alpha', 0.1)

Categories