TLDR: Is there a Python library that allows me to get a application window frame as an image and rewrite it to the said application?
So the whole story is that I want to write an application using Python that does something similar to Lossless Scaling and Magpie. I want to grab an application window (a videogame window, for example), get the current frame as an image, then use some Machine Learning/Deep Learning algorithm (like FSR or DLSS) to upscale said image, then rewrite the current frame from the application with said upscaled image.
So far, I have been playing around with some upscaling algorithms like the one from Real-ESRGAN, but now my main problem is how to upscale the video game images in real-time. The only thing I found that does something related to what I need to do is PyAutoGUI. But this package only allows you to take screenshots of an application but not rewrite the graphics of said application.
I hope I have clarified my problem; feel free to comment if you still have any questions.
Thank you for reading this post, and have a good day.
Doing this with Python is going to be very difficult. A lot of the performance involved in this sort of thing is in avoiding as many memory copies as possible, and Python's idiom for string and bytes processing unfortunately makes quite a few additional copies in the course of any idiomatic program. I say this as a die-hard Python fan who is constantly trying to cram Python in everywhere it doesn't belong: you'd be better off doing this in Rust.
Update: After receiving some feedback from some folks with more direct experience in this sort of thing, I may have overstated the difficulty here. Many ML tools in Python provide zero-copy access, you can easily access and manipulate memory-mapped data from numpy and there is even a CUDA protocol for doing this to data in GPU memory, so while it's not exactly easy, as long as your operations are implemented as numpy operations and not as pure-python pixel-by-pixel logic, it shouldn't be much harder than other python machine learning applications which require access to native APIs for accessing their source data.
However, there's no way to access framebuffer data directly from python, so step 1 is going to be writing your own bindings over the relevant DirectX APIs. Since Magpie is open source, you can see which APIs it's using, for example, in its various C++ "Frame Source" backends. For example, this looks relevant: https://github.com/Blinue/Magpie/blob/42cfcba1222b07e4cec282eaff639aead229f123/Runtime/GraphicsCaptureFrameSource.cpp#L87
You can then look those APIs up on MSDN; that one, for example, is here: https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool.createfreethreaded?view=winrt-22621
CFFI is a good choice for writing native wrappers: https://cffi.readthedocs.io/en/latest/
Gluing these together appropriately is left as an exercise for the reader :).
Related
I want to write some code to do acoustic analysis and I'm trying to determine the proper tool(s) for the job. I would normally write something like this in Python using numpy and scipy and possibly Cython for the analysis part. I've discovered that the world of Python audio libraries is a bit chaotic, with scads of very limited packages in various states of development.
I've also come across a bunch of audio/acoustic specific languages like SuperCollider, Faust, etc. that seem to make the audio processing easy but may be limited in terms of IO and analysis capability.
I'm currently working on Linux with Alsa and PulseAudio installed by default. I would prefer not to involve and of the various and sundry other audio packages like Jack if possible, though that is not a hard requirement.
My primary interest in this question is to determine whether there is a domain specific language that will provide for quicker prototyping and testing or whether a general language like Python is more appropriate. Thanks.
I've got a lot of experience with SuperCollider and Python (with and without Numpy). I do a lot of audio analysis, and I'm afraid the answer depends on what you want to do.
If you want to create systems that will input OR output audio in real time, then Python is not a good choice. The audio I/O libraries (as you say) are a bit sketchy. There's also a fundamental issue that Python's garbage collector is not really designed for realtime stuff. You should use a system that is designed from the ground up for realtime. SuperCollider is nice for this, and as caseyanderson notes, some of the standard building-blocks for audio analysis are right there. There are other environments too.
If you want to do hardcore work such as applying various machine learning algorithms, not necessarily in real time (i.e. if you can get away with reading/writing WAV files rather than live audio), then you should use a general-purpose programming language with wide support, and an ecosystem of good libraries for the extra things you want. Using Python with libs such as numpy and scikits-learn works great for this. It's good for quick prototyping, but not only does it lack solid realtime audio, it also has far fewer of the standard audio building-blocks. Those are two important things which hold you back when prototyping audio pipelines.
So, then, you're caught between these two options. Depending on your application you may be able to combine the two by manipulating the audio I/O in a realtime environment, and using OSC messaging or shell scripts to communicate with an external Python process. The limitation there is that you can't really throw masses of data around between the two (you can't sensibly pipe all your audio across to some other process, that'd be silly).
SuperCollider has lots of support for things along these lines, both as externals/plugins or Quarks. That said, it depends exactly what you want to do. If you are simply looking to detect events, Onsets.kr would be fine. If you are looking for frequency/pitch information, Pitch or Tartini would work (I find Tartini to be more accurate). If you are trying to track amplitude, a combination of Amplitude.ar and some simple math would also work.
Similarly, there is SpecCentroid.kr (for a kind of brightness analysis), Loudness.kr, SpecFlatness.kr, etc.
The above are all pretty general, and there are lots more (the JoshUGens externals package has some interesting FFT-related acoustics stuff). So I would recommend downloading the program, joining the mailing list (if you have further questions), which lives here, and poking around in the Externals, Quarks, and Standard UGens.
Nonetheless, since I am not sure what you are trying to do, I cannot make more concrete recommendations than the above combined with my feeling that it makes the most sense to go to SC for this, rather than writing all of your own tools in Python from scratch.
I'm not 100% sure what you want to do, but as an additional suggestion I would put forth: Spear with scripting in Common Lisp. If what you are doing involves a great deal of spectral analysis, then you can do the heavy Lifting in Spear, and script all of this using Common List with Common Music. Spear has some great tools in terms of editing out very specific partials.
I am using python inside another application (CINEMA 4D) create a nice connection to out issue tracker (Jira) inside the application. Rationale behind this is to make it really easy for our plugin users to report and track bugs and have things like machine specs, screenshots or attaching scene files (including textures) automatically.
So far it as been a really smooth ride and the integration is coming along great. I started grabbing the icons for issue priorities, projects, issue types, etc. from Jira as well so they can be displayed for better overview. To read the image files I am using CINEMA 4D functionality that is available inside its python binding.
The problem now is, that most icons from Jira come in GIF format and the CINEMA 4D SDK doesn't read GIF files directly (actually it does read them, but only through a back door so users can load them, but I can't use that functionality through Python or the SDK). So I need another way to read the GIF files.
There are a few questions on stackoverflow that go into this direction, but they all seem to recommend PIL. This doesn't feel like the right solution for a few reasons:
While that looks nice, it's not part of the standard distribution and seems to be really only maintained for Windows (even though there are builds for Mac OS X).
It also seems to install itself into the current system installation of Python, but CINEMA 4D comes with its own, so I'd have to rip it apart and distribute it with my plugin.
And then it is quite large, while I really only want a compact script to have a compact solution (preferably out of the box, but that doesn't seem to be an option)
I was wondering if there is a simpler or at least more compact way. Since GIF seems to be a relatively simple file format, I am wondering if there may even be a simple parser as a python function/class.
I found a link where somebody disassembles a gif files embedded frames, but doesn't actually grab the image contents: Python, how i can get gif frames
I'm fine with putting in some time on my own, and I would've already been coding away if the file format was something uncompressed, but I am a little reluctant since the compression seems to raise the bar slightly.
I want to learn about graphical libraries by myself and toy with them a bit. I built a small program that defines lines and shapes as lists of pixels, but I cannot find a way to access the screen directly so that I can display the points on the screen without any intermediate.
What I mean is that I do not want to use any prebuilt graphical library such as gnome, cocoa, etc. I usually use Python to code and my program uses Python, but I can also code and integrate C modules with it.
I am aware that accessing the screen hardware directly takes away the multiplatform side of Python, but I disregard it for the sake of learning. So: is there any way to access the hardware directly in Python, and if so, what is it?
No, Python isn't the best choice for this type of raw hardware access to the video card. I would recommend writing C in DOS. Well, actually, I don't recommend it. It's a horrible thing to do. But, it's how I learned to do it, and it's probably about as friendly as you are going to get for accessing hardware directly without any intermediate.
I say DOS rather than Linux or NT because neither of those will give you direct access to the video hardware without writing a driver. That means having to learn the whole driver API, and you need to invoke a lot of "magic," that won't be very obvious because writing a video driver in Windows NT is fairly complicated.
I say C rather than Python because it gives you real pointers, and the ability to do stupid things with them. Under DOS, you can write to arbitrary physical memory addresses in C, which seems to be what you want. Trying to get Python working at all under an OS terrible enough to allow you direct hardware access would be a frustration in itself, even if you only wanted to do simple stuff that Python is good at.
And, as others have said, don't expect to use anything that you learn with this project in the real world. It may be interesting, but if you tried to write a real application in this way, you'd promptly be shot by whoever would have to maintain your code.
This seems a great self-learning path and I would add my two-cents worth and suggest you consider looking at the GObject, Cairo and Pygame modules some time.
The Python GObject module may be at a higher level than your current interest, but it enables pixel level drawing with Cairo (see the home page) as well as providing a general base for portable GUI apps using Python
Pygame also has pixel level methods as well as access methods to the graphics drivers (at the higher level) - here is a quick code example
This is rather an old thread now, but I stumbled upon it while musing the same question.
I used to program in assembly language. In my day, drawing on screen was simply(?) a matter of poking a value into a memory location. The value turned a pixel on or off and defined its colour.
The term 'poke' comes from Basic by the way, not assembler. In assembler, you had to write a value into a data register then tell the processor where to put the data using another command and specifying an address register, usually in hexadecimal form! And each different processor had its own assembly language. But hec was the code fast!
As hardware progressed, I found that graphics hardware programming became more and more complex. There's much more to it than now simply defining a pixel. The graphics subsystem has its own processor -- or processors -- and it's that that you've got to learn to talk to. The processor doesn't just plonk stuff in memory locations. (I believe that what used to be the fastest supercomputer in the world for a while ran on graphics chips!) 'Plonk' is not a Basic command by the way.
Sorry; I digress. In answer to the original poster's query, I believe that the goal of understanding the graphics-drawing process could have been best achieved by experimenting with a Raspberry Pi. It's Python compatible and hence perfect for the job. Its hardware is well documented and it's cheap and easy to use.
Hope this helps someone, Cheers, M
I've been sitting on this idea of working on a "networked intelligence" to look into some interesting ideas on the nature of intelligence and computers. I've decided to go about doing this by designing small robotic agents that will utilize PDP across some medium, (i.e. wifi/IR or something, to be decided), to enable them to gather large quantities of data independently and then be able to process and find trends in data efficiently by utilizing them together as a "supercomputer" (I always think it's odd using that term, but it's apt, one is utilizing multiple independent processing units in unison). I'm aware that Python has some PDP libraries available, and I was hoping to program the robots onto little Arduinos, and I've got a strong idea of how to do every component of the system except for actually implementing the PDP architecture across the system.
TL;DR? I want to make a bunch of little robots that can essentially connect together to form a small supercomputer and share and amalgamate information across all the agents. Is it feasible to create a PDP program that will freely relinquish parts of its processing power and then add in new ones.
I'm a pretty strong programmer, so if it's a matter of complexity and time, I'm willing to apply myself, but if it's an issue of having to strip apart some BIOS software and writing in Assembly, then I'd rather not. I'm not as familiar with PDP ideas as I would like to, and if you have any recommended reading to get me started, much appreciated.
Another note, the languages or platform is completely up for changes, I'd just like to see concrete evidence that one is better than the other.
Interesting idea, remins me on sensor networks.
You may find that an ardunio is a little underpowered for what you want. Perhaps it would be more efficient and easier to send the data back to a PC for processing.
If you want to continue with the ardunio idea, you could implement MapReduce which is a fairly simple construct that allows you to write distributed programs very easily.
I have a write up on the basics of MapReduce.
There is the famous Haddop implementation as well as Disco (python/erlang) and a very simple shell implementation called BashReduce that Last.fm created.
Looking for a non-cloud based open source app for doing data transformation; though for a killer (and I mean killer) app just built for data transformations, I might be willing to spend up to $1000.
I've looked at Perl, Kapow Katalyst, Pentaho Kettle, and more.
Perl, Python, Ruby which are clearly languages, but unable to find any frameworks/DSLs just for processing data; meaning they're really not a great development environments, meaning there's no built GUI's for building RegEx, Input/Output (CSV, XML, JDBC, REST, etc.), no debugger for testing rows and rows of data -- they're not bad either, just not what I'm looking for, which is a GUI built for complex data transformations; that said, I'd love if the GUI/app file was in a scripting language, and NOT just stored in some not human readable XML/ASCII file.
Kapow Katalyst is made for accessing data via HTTP (HTML, CSS, RSS, JavaScript, etc.) it's got a nice GUI for transforming unstructured text, but that's not its core value offering, and is way, way too expensive. It does an okay job of traversing document namespace paths; guessing it's just XPath on the back-end, since the syntax appears to be the same.
Pentaho Kettle has a nice GUI for INPUT/OUTPUT of most common data stores, and its own take on handling data processing; which is okay, and just has a small learning curve. Kettle's debugger is ok, in that the data is easy to see, but the errors and exceptions are not threaded with the output, and there no way to really debug an issue; meaning you can't reload the output/error/exception, but are able to view the system feedback. All that said, Kettle data transformation is _______ well, let's just say it left me feeling like I must be missing something, because I was completely puzzled by "if it's not possible, just write the transformation in JavaScript"; umm, what?
So, any suggestions? Do realize that I haven't really spec'd out any transformations, but figure if you really use a product for data munging, I'd like to know about it; even excel, I guess.
In general though, currently I'm looking for a product that's able to handle 1000-100,000 rows with 10-100 columns. It'd be super cool if it could profile data sets, which is a feature Kettle sort of does, but not super well. I'd also like built in unit testing, meaning I'm able to build out control sets of data, and run changes made against the control set. Then I'd like to be able to selectively filter out rows and columns as I build out the transformation without altering the build; for example, I run a data set through transformation, filter the results, and the next run those sets are automatically blocked at the first "logical" occurrence; which in turn would mean less data to "look at" and a reduced runtime per each enhanced iteration; what would be crazy nice is if as I'd filtering out the rows/columns the app is tracking those, (and the output was filtered out). and unit tested/highlighted any changes. If I made a change that would effect the application logs and it's ability to track the unit tests based on me "breaking a branch" - it'd give me a warning, let me dump the data stored branch... and/or track the primary keys for difference in next generation of output, or even attempt to match them using fuzzy logic. And yes, I know this is a pipe dream, but hey, figured I'd ask, just in case there's something out there I've just never seen.
Feel free to comment, I'd be happy to answer any questions, or offer additional info.
Google Refine?
Talend will need more than 5 minutes of your time, perhaps closer to about 1 hour to begin to wire up a basic transformations and being able to fulfill your requirement to keep versioned control transformations as well. You described a Pipeline process that can be done easily in Talend when you know how, where you have multiple inputs and outputs in a project as the same raw data goes through various transformations and filtering, until it arrives as final output as you desired. Then you can schedule your jobs to repeat the process over similar data. Go back and spend more time with Talend, and you'll succeed in what you need, I'm sure.
I also happen to be one of the committers of Google Refine and also use Talend in my daily work. I actually sometimes model my transformations for Talend first in Google Refine. (Sometimes even using Refine to perform cleanup on borked ETL transforms themselves! LOL ) I can tell you that my experience with Talend played a small part in a few of the features of Google Refine. For instance, both Talend and Google Refine have the concept of an expression editor for your transformations (Talend goes down to Java language for this if need be).
Google Refine will never be an ETL tool, in the sense that we have not designed it to compete in that space were ETL is typically used for large data warehouse backend processing & transformations. However, we designed Google Refine to compliment existing ETL tools like Talend by allowing easy live previewing to make informed decisions about your transformations and cleanup, and if your data isn't incredibly huge, then you might opt to perform what you need within Refine itself.
I'm not sure exactly what kind of data or exactly what kind of transformations you're trying to do, but if it's primarily mathematical transformation, perhaps you can try FreeMat, Octave, or SciLab. If it's more data-warehouse-style munging, try open source ETL tools like Clover, Talend, JasperETL Community Edition, or Jitterbit.