Problematic Parallel Distributive Processing using Python Libraries or any language

Problematic Parallel Distributive Processing using Python Libraries or any language - python

I've been sitting on this idea of working on a "networked intelligence" to look into some interesting ideas on the nature of intelligence and computers. I've decided to go about doing this by designing small robotic agents that will utilize PDP across some medium, (i.e. wifi/IR or something, to be decided), to enable them to gather large quantities of data independently and then be able to process and find trends in data efficiently by utilizing them together as a "supercomputer" (I always think it's odd using that term, but it's apt, one is utilizing multiple independent processing units in unison). I'm aware that Python has some PDP libraries available, and I was hoping to program the robots onto little Arduinos, and I've got a strong idea of how to do every component of the system except for actually implementing the PDP architecture across the system.
TL;DR? I want to make a bunch of little robots that can essentially connect together to form a small supercomputer and share and amalgamate information across all the agents. Is it feasible to create a PDP program that will freely relinquish parts of its processing power and then add in new ones.
I'm a pretty strong programmer, so if it's a matter of complexity and time, I'm willing to apply myself, but if it's an issue of having to strip apart some BIOS software and writing in Assembly, then I'd rather not. I'm not as familiar with PDP ideas as I would like to, and if you have any recommended reading to get me started, much appreciated.
Another note, the languages or platform is completely up for changes, I'd just like to see concrete evidence that one is better than the other.

Interesting idea, remins me on sensor networks.
You may find that an ardunio is a little underpowered for what you want. Perhaps it would be more efficient and easier to send the data back to a PC for processing.
If you want to continue with the ardunio idea, you could implement MapReduce which is a fairly simple construct that allows you to write distributed programs very easily.
I have a write up on the basics of MapReduce.
There is the famous Haddop implementation as well as Disco (python/erlang) and a very simple shell implementation called BashReduce that Last.fm created.

Related

Manipulate an application window frame using Python

TLDR: Is there a Python library that allows me to get a application window frame as an image and rewrite it to the said application?
So the whole story is that I want to write an application using Python that does something similar to Lossless Scaling and Magpie. I want to grab an application window (a videogame window, for example), get the current frame as an image, then use some Machine Learning/Deep Learning algorithm (like FSR or DLSS) to upscale said image, then rewrite the current frame from the application with said upscaled image.
So far, I have been playing around with some upscaling algorithms like the one from Real-ESRGAN, but now my main problem is how to upscale the video game images in real-time. The only thing I found that does something related to what I need to do is PyAutoGUI. But this package only allows you to take screenshots of an application but not rewrite the graphics of said application.
I hope I have clarified my problem; feel free to comment if you still have any questions.
Thank you for reading this post, and have a good day.

Doing this with Python is going to be very difficult. A lot of the performance involved in this sort of thing is in avoiding as many memory copies as possible, and Python's idiom for string and bytes processing unfortunately makes quite a few additional copies in the course of any idiomatic program. I say this as a die-hard Python fan who is constantly trying to cram Python in everywhere it doesn't belong: you'd be better off doing this in Rust.
Update: After receiving some feedback from some folks with more direct experience in this sort of thing, I may have overstated the difficulty here. Many ML tools in Python provide zero-copy access, you can easily access and manipulate memory-mapped data from numpy and there is even a CUDA protocol for doing this to data in GPU memory, so while it's not exactly easy, as long as your operations are implemented as numpy operations and not as pure-python pixel-by-pixel logic, it shouldn't be much harder than other python machine learning applications which require access to native APIs for accessing their source data.
However, there's no way to access framebuffer data directly from python, so step 1 is going to be writing your own bindings over the relevant DirectX APIs. Since Magpie is open source, you can see which APIs it's using, for example, in its various C++ "Frame Source" backends. For example, this looks relevant: https://github.com/Blinue/Magpie/blob/42cfcba1222b07e4cec282eaff639aead229f123/Runtime/GraphicsCaptureFrameSource.cpp#L87
You can then look those APIs up on MSDN; that one, for example, is here: https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool.createfreethreaded?view=winrt-22621
CFFI is a good choice for writing native wrappers: https://cffi.readthedocs.io/en/latest/
Gluing these together appropriately is left as an exercise for the reader :).

Audio Domain Specific Language vs Python

I want to write some code to do acoustic analysis and I'm trying to determine the proper tool(s) for the job. I would normally write something like this in Python using numpy and scipy and possibly Cython for the analysis part. I've discovered that the world of Python audio libraries is a bit chaotic, with scads of very limited packages in various states of development.
I've also come across a bunch of audio/acoustic specific languages like SuperCollider, Faust, etc. that seem to make the audio processing easy but may be limited in terms of IO and analysis capability.
I'm currently working on Linux with Alsa and PulseAudio installed by default. I would prefer not to involve and of the various and sundry other audio packages like Jack if possible, though that is not a hard requirement.
My primary interest in this question is to determine whether there is a domain specific language that will provide for quicker prototyping and testing or whether a general language like Python is more appropriate. Thanks.

I've got a lot of experience with SuperCollider and Python (with and without Numpy). I do a lot of audio analysis, and I'm afraid the answer depends on what you want to do.
If you want to create systems that will input OR output audio in real time, then Python is not a good choice. The audio I/O libraries (as you say) are a bit sketchy. There's also a fundamental issue that Python's garbage collector is not really designed for realtime stuff. You should use a system that is designed from the ground up for realtime. SuperCollider is nice for this, and as caseyanderson notes, some of the standard building-blocks for audio analysis are right there. There are other environments too.
If you want to do hardcore work such as applying various machine learning algorithms, not necessarily in real time (i.e. if you can get away with reading/writing WAV files rather than live audio), then you should use a general-purpose programming language with wide support, and an ecosystem of good libraries for the extra things you want. Using Python with libs such as numpy and scikits-learn works great for this. It's good for quick prototyping, but not only does it lack solid realtime audio, it also has far fewer of the standard audio building-blocks. Those are two important things which hold you back when prototyping audio pipelines.
So, then, you're caught between these two options. Depending on your application you may be able to combine the two by manipulating the audio I/O in a realtime environment, and using OSC messaging or shell scripts to communicate with an external Python process. The limitation there is that you can't really throw masses of data around between the two (you can't sensibly pipe all your audio across to some other process, that'd be silly).

SuperCollider has lots of support for things along these lines, both as externals/plugins or Quarks. That said, it depends exactly what you want to do. If you are simply looking to detect events, Onsets.kr would be fine. If you are looking for frequency/pitch information, Pitch or Tartini would work (I find Tartini to be more accurate). If you are trying to track amplitude, a combination of Amplitude.ar and some simple math would also work.
Similarly, there is SpecCentroid.kr (for a kind of brightness analysis), Loudness.kr, SpecFlatness.kr, etc.
The above are all pretty general, and there are lots more (the JoshUGens externals package has some interesting FFT-related acoustics stuff). So I would recommend downloading the program, joining the mailing list (if you have further questions), which lives here, and poking around in the Externals, Quarks, and Standard UGens.
Nonetheless, since I am not sure what you are trying to do, I cannot make more concrete recommendations than the above combined with my feeling that it makes the most sense to go to SC for this, rather than writing all of your own tools in Python from scratch.

I'm not 100% sure what you want to do, but as an additional suggestion I would put forth: Spear with scripting in Common Lisp. If what you are doing involves a great deal of spectral analysis, then you can do the heavy Lifting in Spear, and script all of this using Common List with Common Music. Spear has some great tools in terms of editing out very specific partials.

Better platform to turn software into VHDL/Verilog for an FPGA

I am looking at developing on an FPGA, but it would be easier for me to write the code in Python or Scala and have it converted to VHDL or Verilog.
I want to have many sensors hooked up to a device, and as the data comes in, calculations are done very quickly so it can be displayed on a video wall, so the FPGA would have as input dozens of sensors and several video controllers for the wall.
This is a library for code written in Scala. For this one I am curious if the code is written in Java and Scala would that affect what it generates.
http://simplifide.com/drupal6/
This is a python to VHDL converter.
http://www.myhdl.org/doku.php
With both of these I am curious as to the limitations.
I would prefer simplifide, as I am stronger at Scala than Python, but it seems that myhdl may be a more robust platform, just from some basic looking around.
UPDATE:
The reason for the FPGA is that it can do multiple tasks at one time very well, so when the data comes in, depending on the needs of users, based on the experiment, it would be easy to change the code on the FPGA to be able to adapt to the needs.
So, for example, if you have 8 x 3 different weather sensors on each floor of an office building, so there are temperature, wind speed, barometric sensors (8 of each sensor one each floor), and add sensors to test the deformation of the walls, then a real-time interface to read these in at the same time and keep updating the visual display may be helpful.
This is a made up example, but it would explain why an FPGA would be useful, otherwise I would need many different DSPs and then feed that into a computer to do the visual display, whereas an FPGA can do is faster, since it is hardware, with lower power needs.
There are two open-source libraries that can help make this easier in development, but I am not certain which would be a better platform to use to convert a program to VHDL/Verilog.
This is just one example. If I want to do a quantum circuit simulation on an FPGA, as this article suggests (http://www.cc.gatech.edu/computing/nano/documents/Radecka%20-%20FPGA%20Emulation%20of%20Quantum%20Circuits.pdf) then it would be easier to do this as a program, than building up a large circuit by hand.

Yes there is a python style HDL available and its free. MYHDL
This will generate VHDL or verilog . It can also simulate the code and output .VDI and you can look that in gtkwave
alternately if you want to edit the VHDL code you can use GHDL. google it you get lots of resources. there is OS available Fedora Electronics Lab it has all the tools to develop modern electronics.
All these are open source. Build and simulate using these tools.
To flash into the FPGA you need either xilinx or Altera tool chains to generate the bitstreams and flash them. All the best !

If you can afford it, I don't think anything will make your life easier than National Instruments' FPGA add-on for LabView. The visual environment of LabView is a reasonable fit for FPGA programming, and it takes care of many of the annoying details for you (unless you must worry about them as part of the algorithm, e.g. by building pipelines to hit your clock speed targets). Also, you may find that NI's real-time (non-FPGA) or DSP or DAQ or other solutions are adequate for your needs.

This is a made up example, but it would explain why an FPGA would be
useful, otherwise I would need many different DSPs and then feed that
into a computer to do the visual display, whereas an FPGA can do is
faster, since it is hardware, with lower power needs.
This depends entirely on the exact nature of the algorithms you need to execute.
There are two open-source libraries that can help make this easier in
development, but I am not certain which would be a better platform to
use to convert a program to VHDL/Verilog.
This is just one example. If I want to do a quantum circuit simulation
on an FPGA, as this article suggests
(http://www.cc.gatech.edu/computing/nano/documents/Radecka%20-%20FPGA%20Emulation%20of%20Quantum%20Circuits.pdf)
then it would be easier to do this as a program, than building up a
large circuit by hand.
It looks like you're looking for a High Level Synthesis tool, which neither of those is. For generating RTL for algorithms, code generators can definitely help but you'll still have to get your hands dirty with HDLs for other things.

I think you are looking for some tooling that Modaë Technologies is offering here. You can start with either Ruby or Python code, at the algorithmic/behavioral level. Their tools are capable of inferring data types automatically and converting the code to HDL (currently VHDL) at the RTL level.

One or two years ago I worked with MyHDL and LabVIEW. I wrote HDL in MyHDL, exported it as VHDL and imported them as external IP in LabVIEW.
LabView
It's nice for FPGA development, because you can keep track of clocks consumed by each sequential branch because of it's graphical representation. Pipelining and slicing your algorithm graphically is worth a lot to make sure that correct values are processed together.
However, when it comes to generating constants, initializer lists or generate recursive structures, the mapping from visual to actual hardware this approach is ... sub-optimal.
MyHDL
It's basically python, but the syntax looks a lot like Verilog with the decorators dedicated to be used as #always, etc. statements.
You're free to use any python-valid code in your HDL code. For testing purposes it might be applicable to write a test function with python before acutally implementing it on register transfer level (RTL).
When generating recursive structures you have the usual for statement.
Need a look-up table (LUT) for your algorithm? One line of list comprehension: Done.
For a full list of features, see the website.
Summary
LabVIEW is great for beginning, because you can focus on the actual implementation.
AS soon as you mastered to think parallel and on RTL and look into implementing more complex algorithms you may find MyHDL being better in
managing your IP without being bound to a proprietary platform
testing your code with full python power
sharing, versioning, etc.

Is mixing Clojure with Python a good idea?

I am working on a big project that involves a lot of web based and AI work. I am extremely comfortable with Python, though my only concern is with concurrent programming and scaling this project to make it work on clusters. Thus, Clojure for AI and support for Java function calls and bring about concurrent programming.
Is this a good idea to do all the web-based api work with Python and let Clojure take care of most of the concurrent AI work?
Edit:
Let me explain the interaction in detail. Python would be doing most of the dirty work (scraping, image processing, improving the database and all that.) Clojure, if possible, would either deal with the data base or get the data from Python. I except something CPython sort of linking with Python and Clojure.
Edit2:
Might be a foolish question to ask, but this being a rather long term project which will evolve quite a bit and go under several iterations, is Clojure a language here to stay? Is it portable enough?

I built an embarrassingly parallel number-crunching application with a backend in Clojure (on an arbitrary number of machines) and a frontend in Ruby on Rails. I don't particularly like RoR, but this was a zero-budget project at the time and we had a Rails programmer at hand who was willing to work for free.
The Clojure part consisted of (roughly) a controller, number crunching nodes, and a server implementing a JSON-over-HTTP API which was the interface to the Rails web app. The Clojure nodes used RabbitMQ to talk to each other. Because we defined clear APIs between different parts of the application, it was easy to later rewrite the frontend in Clojure (because that better suited our needs).
If you're working on a distributed project with a long life span and continuous development effort, it could make sense to design the application as a number of separate modules that communicate through well defined APIs (json, bson, ... over AMQP, HTTP, ... or a database). That means you can get started quickly using a language you're comfortable with, and rewrite parts in another language at a later stage if necessary.

I can't see a big problem with using Python for the web apps and Clojure for the concurrent data crunching / back end code. I assume you would use something like JSON over http for the communications between the two, which should work fine.
I'd personally use Clojure for both (using e.g. the excellent Noir as a web framework and Korma for the database stuff.), but if as you say your experience is mostly in Python then it probably makes sense to stick with Python from a productivity perspective (in the short term at least).
To answer the questions regarding the future of Clojure:
It's definitely here to stay. It has a very active community and is probably one of the "hottest" JVM languages right now (alongside Scala and Groovy). It seems to be doing particularly well in the big data / analytics space
Clojure has a particular advantage in terms of library support, since it can easily make use of any Java libraries. This is a huge advantage for a new langauge from a practical perspective, since it immediately solves what is usually one of the biggest issues in getting a new language ecosystem off the ground.
Clojure is a new language that is still undergoing quite a lot of development. If you choose to use Clojure, you should be aware that you will need to put in some effort to stay current and keep your code up to date with the latest Clojure versions. I've personally not found this to be an issue, but it may come as a surprise to people used to more "stable" languages like Java.
Clojure is very portable - it will basically run anywhere that you can get a reasonably modern JVM, which is pretty much everywhere nowadays.

If you can build both sides to use Data and Pure(ish) Functions to communicate then this should work very well. wrapping your clojure functions in web services that take and retrun JSON (or more preferably clojure forms) should make them accessible to your Python based front end will no extra fuss.
Of course it's more fun to write it in Clojure all the way through. ;)
If this is a long term project than building clean Functional (as in takes and returns values) interfaces that exchange Data becomes even more important because it will give you the ability to evolve the components independently.

In such scenarios I personally like to start in the below sequence.
Divide the system into subsystems with "very clear" definition of what each system does and that definition should follow the principle of "do one thing and keep it simple". At this stage don't think about language etc.
Choose the platform (not languages) on which these subsystems will run. Ex: JVM, Python VM, NodeJs, CLR(Mono), other VMs. Try to select few platforms or if possible just one as that does make life easier down the road in terms of complexity.
Choose the language to program those platforms. This is very subjective but for JVM you can go with Clojure or Jython (in case you like Dynamic languages as I do).
As far as Clojure future is concerned, this is a language developed by "community of amazing programmers" and not by some corporation. I hope that clears your doubt about the "long term" concern of Clojure. By the way Clojure is LISP, so you can modify the language the way you want it and fix things yourself even if someone don't do that for you.

Data recognition, parsing, filtering, and transformation -- GUI?

Looking for a non-cloud based open source app for doing data transformation; though for a killer (and I mean killer) app just built for data transformations, I might be willing to spend up to $1000.
I've looked at Perl, Kapow Katalyst, Pentaho Kettle, and more.
Perl, Python, Ruby which are clearly languages, but unable to find any frameworks/DSLs just for processing data; meaning they're really not a great development environments, meaning there's no built GUI's for building RegEx, Input/Output (CSV, XML, JDBC, REST, etc.), no debugger for testing rows and rows of data -- they're not bad either, just not what I'm looking for, which is a GUI built for complex data transformations; that said, I'd love if the GUI/app file was in a scripting language, and NOT just stored in some not human readable XML/ASCII file.
Kapow Katalyst is made for accessing data via HTTP (HTML, CSS, RSS, JavaScript, etc.) it's got a nice GUI for transforming unstructured text, but that's not its core value offering, and is way, way too expensive. It does an okay job of traversing document namespace paths; guessing it's just XPath on the back-end, since the syntax appears to be the same.
Pentaho Kettle has a nice GUI for INPUT/OUTPUT of most common data stores, and its own take on handling data processing; which is okay, and just has a small learning curve. Kettle's debugger is ok, in that the data is easy to see, but the errors and exceptions are not threaded with the output, and there no way to really debug an issue; meaning you can't reload the output/error/exception, but are able to view the system feedback. All that said, Kettle data transformation is _______ well, let's just say it left me feeling like I must be missing something, because I was completely puzzled by "if it's not possible, just write the transformation in JavaScript"; umm, what?
So, any suggestions? Do realize that I haven't really spec'd out any transformations, but figure if you really use a product for data munging, I'd like to know about it; even excel, I guess.
In general though, currently I'm looking for a product that's able to handle 1000-100,000 rows with 10-100 columns. It'd be super cool if it could profile data sets, which is a feature Kettle sort of does, but not super well. I'd also like built in unit testing, meaning I'm able to build out control sets of data, and run changes made against the control set. Then I'd like to be able to selectively filter out rows and columns as I build out the transformation without altering the build; for example, I run a data set through transformation, filter the results, and the next run those sets are automatically blocked at the first "logical" occurrence; which in turn would mean less data to "look at" and a reduced runtime per each enhanced iteration; what would be crazy nice is if as I'd filtering out the rows/columns the app is tracking those, (and the output was filtered out). and unit tested/highlighted any changes. If I made a change that would effect the application logs and it's ability to track the unit tests based on me "breaking a branch" - it'd give me a warning, let me dump the data stored branch... and/or track the primary keys for difference in next generation of output, or even attempt to match them using fuzzy logic. And yes, I know this is a pipe dream, but hey, figured I'd ask, just in case there's something out there I've just never seen.
Feel free to comment, I'd be happy to answer any questions, or offer additional info.

Google Refine?

Talend will need more than 5 minutes of your time, perhaps closer to about 1 hour to begin to wire up a basic transformations and being able to fulfill your requirement to keep versioned control transformations as well. You described a Pipeline process that can be done easily in Talend when you know how, where you have multiple inputs and outputs in a project as the same raw data goes through various transformations and filtering, until it arrives as final output as you desired. Then you can schedule your jobs to repeat the process over similar data. Go back and spend more time with Talend, and you'll succeed in what you need, I'm sure.
I also happen to be one of the committers of Google Refine and also use Talend in my daily work. I actually sometimes model my transformations for Talend first in Google Refine. (Sometimes even using Refine to perform cleanup on borked ETL transforms themselves! LOL ) I can tell you that my experience with Talend played a small part in a few of the features of Google Refine. For instance, both Talend and Google Refine have the concept of an expression editor for your transformations (Talend goes down to Java language for this if need be).
Google Refine will never be an ETL tool, in the sense that we have not designed it to compete in that space were ETL is typically used for large data warehouse backend processing & transformations. However, we designed Google Refine to compliment existing ETL tools like Talend by allowing easy live previewing to make informed decisions about your transformations and cleanup, and if your data isn't incredibly huge, then you might opt to perform what you need within Refine itself.

I'm not sure exactly what kind of data or exactly what kind of transformations you're trying to do, but if it's primarily mathematical transformation, perhaps you can try FreeMat, Octave, or SciLab. If it's more data-warehouse-style munging, try open source ETL tools like Clover, Talend, JasperETL Community Edition, or Jitterbit.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.