Python Image processing screenshots

Python Image processing screenshots - python

I'm trying to write a program than will detect when my mouse pointer will change icon and automatically send out a mouse click. Is there a better way to do this than to take screenshots and parse the image for the mouse icon?
EDIT:
I'm running my program on windows 7.
I'm trying to learn some image processing and make a simple flash game i made automated.
Rules: when the curses changes shape, click to get a point.
Also what imaging modules for python will allow you to take a specific size screenshot not just the whole screen? This question has moved to a new thread: "Taking Screen shots of specific size"

The way to do this in Windows is to install either a global message hook with SetWindowsHookEx or SetWinEventHook. (Alternatively, you could build a DLL that embeds Python and hooks into the browser or its Flash wrapper app and do it less intrusively from within the app, but that's much more work.)
The message you want is WM_SETCURSOR. Note that this is the message sent by Windows to the app to ask whether it wants to change the cursor, not a message sent when the cursor changes. So, IIRC, you will want to put a WH_CALLWNDPROC and a WH_CALLWNDPROCRET and check GetCursorInfo before and after to see if the app has done so.
So, how do you do this from Python? Honestly, if you don't already know both win32api and friends from the pywin32 package, and how to write Windows message procs in some language, you probably don't want to. If you do want to, I'd start off with the (abandoned) pyHook project from UNC Assist. Even if you can't get it working, it's full of useful source code.
You should also search SO for [python] SetWinEventHook and [python] SetWindowsHookEx, and google around a bit; there are some examples out there (I even wrote one here somewhere…)
You can look at higher-level wrapper frameworks like pywinauto and winGuiAuto, but as far as I know, none of them has much help for capturing events.
I believe there are other tools, maybe AutoIt, that have all the functionality you need, but not in Python module. (AutoIt, for example, has its own VB-like scripting language instead.)

Related

How to capture the screen output of a running program?

Is it possible to take screenshots of a running program (with GUI) from another python program ?
If so, what could be the steps and libraries that I could use ? (On Windows)
For example, let's say I have calc.exe running. I'd want to take screenshots of what is displayed to the user from myprogram.py.
My goal is to analyze what's displayed on the monitored program.
If it's not possible to isolate the screenshot to a running predefined program, I think I will have to take screenshots of the fullscreen but it's not very practical.

Capturing an screenshot is easy. Just install the Python Imaging Library and use the ImageGrab.grab() function to return an Image instance with the screenshot.
Capturing an specified window is a little more complicated, because you need the window coordinates. I recommend you to install the win32api modules and use a little module called winGuiAuto.py. Once you do that, you can do something like this:
hwnd = winGuiAuto.findTopWindow(title)
rect = win32gui.GetWindowPlacement(hwnd)[-1]
image = ImageGrab.grab(rect)
However, capturing the screen is the easy part. If you want to analyze the contents from screenshots, you're in for a lot of complications. This is probably the wrong approach for doing what you want and should be left as a last resort.
In most cases, it's easier to use the windows api to read the contents of a window's elements directly, but that won't work with some 3rd party GUI toolkits. That's not within the scope of your question so I'm not detailing it here, but you should read the source of the winGuiAuto.py module mentioned above for examples on how to do that, as well as checking the pywinauto library.

The ImageGrab Module, works on Windows only. The pyscreenshot module, is a better replacement for that, can be used to copy the contents of the screen to a PIL or Pillow image memory. Read more at link below.
https://pypi.python.org/pypi/pyscreenshot

Hijacking, redirecting, display output with Python 2.7

I am a new programmer with little experience but I am in the process of learning Python 2.7. I use Python(x,y) or Spydar as the programs are called on Windows 7.
The main packages I'm using are numpy, pil and potentially win32gui.
I am currently trying to write a program to mine information from a 3rd-party software. This is against their wishes and they have made it difficult. I'm using ImageGrab and then numpy to get some results. This however, or so i belive, forces me to keep the window I want to read in focus, which is not optimal.
I'm wondering if there is any way to hijack the whole window and redirect the output directly into a "virtual" copy, just so I can have it running in the background?
When looking at the demos for win32api, there is a script called desktopmanager. I never got it to work, probably since I'm running Windows 7, that's supposed to create new desktops. I don't really know how multiple desktops work but if they run in parallel, there may be a way to create a new desktop around a current window. I don't know how, it's just a thought so far.
The reason it's not working for me is not that it's not creating a new desktop, it's that once it's been created, I can't return from it. The taskbar icon nor the taskbar itself ever appears.

One approach that might work would be to do something like so:
get the window handle (FindWindow() or something similar, there are a few ways to do this)
get the window dimensions (GetClientRect() or GetWindowRect())
get the device context for the window (GetWindowDC())
get the image data from the window (BitBlt() or similar)
It is possible that you will need elevated privelages to access another processes window dc, if so you may need to inject code/dll into the target process space to do this.
HTH.

Get content from open window in Linux

I want to collect data and parse it eventually from an open window in linux.
An example- Suppose a terminal window is open. I need to retrieve all the data that appears on that window. After retrieval, I would parse it to get specific commands entered.
So is it possible to do that? If so, how? I would prefer to use python to code this entire thing.
I am making a guess that first I would have to get some sort of ID for the open window and then use some kind of library to get the content from the window whose ID I have got.
Please help. I am quite a newbie.

You can (ab)use the assistive technologies support (for screen readers and such) that exist in the toolkit libraries. Whether it will work is toolkit specific—Gtk and Qt have this support, but others (like Tk, Fltk, etc.) may or may not.
The Linux Desktop Testing Project is a python toolkit for abusing these interfaces for testing GUI applications, so you can either use it or look how it works and do similar thing.

I think the correct answer may be "with some difficulty". Essentially, the contents of a window is a bitmap. This bitmap is drawn on by a whole slew of primitives (including "display this octet-string, using that encoding and a specific font"), but the window contents is still "just pixels".
Getting the "just pixels" is pretty straight-forward, as these things go. You open a session to the X server and say "given me the contents of window W" and it hands it over.
Doing something useful with it is, unfortunately, a completely different matter, as you'd potentially have to (essentially) OCR the bitmap for what you want.
If you decide to take that route, have a look at the source of xwd, as that does, essentially, that.

Do you have some sort of control over the execution of the terminal? In that case, you can use the script command in the terminal session to log all interaction to a file and then read and parse the file.
$ script myfile
Script started, file is myfile
$ ls
...
$ exit
Script done, file is myfile
$ parse_file.py myfile
If the terminal is running inside of screen, you have other options as well. Screen has logging built in, screen -X sends commands to a running screen session (man screen).

How to get in python the key pressed without press enter?

I saw here a solution, but i don't want wait until the key is pressed. I want to get the last key pressed.

The related question may help you, as #S.Lott mentioned: Detect in python which keys are pressed
I am writting in, though to give yu advice: don't worry about that.
What kind of program are you trying to produce?
Programas running on a terminal usually don't have an interface in which getting "live" keystrokes is interesting. Not nowadays. For programs running in the terminal, you should worry about a usefull command line User Interfase, using the optparse or other modules.
For interative programs, you should use a GUI library and create a decent UI for your users, instead of reinventing the wheel.Which wouldb eb etter for what you ar trying to do? Theuser click on an icon,a window opens on the screen, witha couple of buttons on it, and half a dozen or so menu options packed under a "File" menu as all the otehr windws on the screen - or - a black terminal opens up, with an 80's looking text interface with some blue-highlighted menu options and so on?. You can use Tkinter for simple windowed applications, as it comes pre-installed with Python + Windows, so that yoru users don't have to worry about installign aditional libraries.
Rephrasing it just to be clear: Any program that requires a user interface should either se a GUI library, or have a WEB interface. It is a waste of your time, and that of your users, to try and create a UI operating over the terminal - we are not in 1989 any more.
If you absolutely need a text interface, you should look at the ncurses library then. Better than trying to reinvent the wheel.

http://code.activestate.com/recipes/134892/
i think it's what you need
ps ooops, i didn't see it's the same solution you rejected...why, btw?
edit:
do you know:
from msvcrt import getch
it works only in windows, however...
(and it is generalised in the above link)
from here: http://www.daniweb.com/forums/thread115282.html

Programmatically launching standalone Adobe flashplayer on Linux/X11

The standalone flashplayer takes no arguments other than a .swf file when you launch it from the command line. I need the player to go full screen, no window borders and such. This can be accomplished by hitting ctrl+f once the program has started. I want to do this programmatically as I need it to launch into full screen without any human interaction.
My guess is that I need to some how get a handle to the window and then send it an event that looks like the "ctrl+f" keystroke.
If it makes any difference, it looks like flashplayer is a gtk application and I have python with pygtk installed.
UPDATE (the solution I used... thanks to ypnos' answer):
./flashplayer http://example.com/example.swf & sleep 3 && ~/xsendkey -window "Adobe Flash Player 10" Control+F

You can use a dedicated application which sends the keystroke to the window manager, which should then pass it to flash, if the window starts as being the active window on the screen. This is quite error prone, though, due to delays between starting flash and when the window will show up.
For example, your script could do something like this:
flashplayer *.swf
sleep 3 && xsendkey Control+F
The application xsendkey can be found here: http://people.csail.mit.edu/adonovan/hacks/xsendkey.html
Without given a specific window, it will send it to the root window, which is handled by your window manager. You could also try to figure out the Window id first, using xprop or something related to it.
Another option is a Window manager, which is able to remember your settings and automatically apply them. Fluxbos for example provides this feature. You could set fluxbox to make the Window decor-less and stretch it over the whole screen, if flashplayer supports being resized. This is also not-so-nice, as it would probably affect all the flashplayer windows you open ever.

I've actually done this a long time ago, but it wasn't petty. What we did is use the Sawfish window manager and wrote a hook to recognize the flashplayer window, then strip all the decorations and snap it full screen.
This may be possible without using the window manager, by registering for X window creation events from an external application, but I'm not familiar enough with X11 to tell you how that would be done.
Another option would be to write a pygtk application that embedded the standalone flash player inside a gtk.Socket and then resized itself. After a bit of thought, this might be your best bet.

nspluginplayer --fullscreen src=path/to/flashfile.swf
which is from the [http://gwenole.beauchesne.info//en/projects/nspluginwrapper](nspluginwrapper project)

Another option would be to write a pygtk application that embedded the standalone flash player inside a gtk.Socket and then resized itself. After a bit of thought, this might be your best bet.
This is exactly what I did. In addition to that, my player scales flash content via Xcomposite, Xfixes and Cairo. A .deb including python source be found here:
http://www.crutzi.info/crutziplayer

I've done this using openbox using a similar mechanism to the one that bmdhacks mentions. The thing that I did note from this was that the standalone flash player performed considerably worse fullscreen than the same player in a maximised undecorated window. (that, annoyingly is not properly fullscreen because of the menubar). I was wondering about running it with a custom gtk theme to make the menu invisible. That's just a performance issue though. If fullscreen currently works ok, then it's unneccisarily complicated. I was running on an OLPC XO, performance is more of an issue there.
I didn't have much luck with nspluginplayer (too buggy I think).
Ultimately I had the luxury of making the flash that was running so I could simply place code into the flash itself. By a similar token, Since you can embed flash within flash, it should be possible to make a little stub swf that goes fullscreen automatically and contains the target sfw.

You have to use Acton script 3 cmd:
stage.displayState = StageDisplayState.FULL_SCREEN;
See Adobe Action script 3 programming.
But be careful : in full screen, you will lose display performances!
I've got this problem ... more under Linux!!!

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.