Is Python sometimes simply not fast enough for a task?

Is Python sometimes simply not fast enough for a task? - python

I noticed a lack of good soundfont-compatible synthesizers written in Python. So, a month or so ago, I started some work on my own (for reference, it's here). Making this was also a challenge that I set for myself.
I keep coming up against the same problem again and again and again, summarized by this:
To play sound, a stream of data with a more-or-less constant rate of flow must be sent to the audio device
To synthesize sound in real time based on user input, little-to-no buffering can be used
Thus, there is a cap on the amount of time one 'buffer generation loop' can take
Python, as a language, simply cannot run fast enough to do synthesize sound within this time limit
The problem is not my code, or at least, I've tried to optimize it to extreme levels - using local variables in time-sensitive parts of the code, avoiding using dots to access variables in loops, using itertools for iteration, using pre-compiled macros like max, changing thread switching parameters, doing as few calculations as possible, making approximations, this list goes on.
Using Pypy helps, but even that starts to struggle after not too long.
It's worth noting that (at best) my synth at the moment can play about 25 notes simultaneously. But this isn't enough. Fluidsynth, a synth written in C, has a cap on the number of notes per instrument at 128 notes. It also supports multiple instruments at a time.
Is my assertion that Python simply cannot be used to write a synthesizer correct? Or am I missing something very important?

Related

I am unable to randomize blocks in my Psychopy study? What can I do?

:slight_smile:
What are you trying to achieve?:
I am currently doing a reaction time and accuracy task that involves comparing visually presented numerical information and auditory numerical information. The visually presented numerical information will be presented in three forms - Arabic numerals (e.g 5), number words (e.g five), and non-symbolic magnitude (a picture of 5 dots). Both visual numerical information and auditory numerical information will be presented sequentially. After the presentation of the 2nd stimulus, participants are to respond if these two stimuli are conveying the same information or not. They are supposed to press “a” if the numerical information is the same and “l” if it’s different.
Apart from varying the format of the visual numerical stimuli I am presenting, I also intend to vary the stimulus onset asynchrony (SOA)/time interval between the two stimuli. I have 7 levels of time intervals/SOAs (plus minus 750, 250 and 500, and 0ms), resulting in me creating my experiment in such a manner (see attached picture).
One set of fixation_cross and VA_750ms (for example) constitutes a block. Hence, in total, there are 7 blocks here (only 4 are pictured though). I have already randomized the trials within each block. The next step for me is to randomize the presentation of these blocks, with one block denoting one level of SOA/time interval (e.g +750ms). To do this, I’ve placed a loop around all the blocks, with this loop titled “blocknames” in the picture. While the experiment still works fine, randomization still doesn’t occur.
I understand that there was a post addressing the randomization of blocks, but I felt that it was more specific to experiments that only have one routine. This is not very feasible for my case considering that I would have to vary the time interval between two numerical stimuli within a trial.
What did you try to make it work?:
Nevertheless, I’ve tried to create an excel file with the names of the excel files in each condition - across all routines, the excel files actually contain the same information, but they’re just named differently according to what the condition name is (e.g AV500ms, VA750ms). In this case, the experiment still works, but the blocks are still not being randomized.
What specifically went wrong when you tried that?:
With the same excel file, I also tried to label my conditions as $condsFile instead of using the exact document location, but this was what I got instead.
At the same time, I was wondering if I could incorporate my SOA/time interval levels into Excel instead - how would this be carried out in Builder?
This might be some useful background info on my Psychopy software and laptop.
OS (e.g. Win10): Win 10
PsychoPy version (e.g. 1.84.x): 2020.1.3
Standard Standalone? (y/n) Yes,
I apologize if this might have been posted a few times. However, I’ve tried to apply these solutions according to what my experiment requires, but to no avail. I’m also quite a new user to Psychopy and am not very sure on how to proceed from here as well. Would really appreciate any advice on this!

This isn't really a programming question per se, as it can be addressed entirely by using the graphical Builder interface of PsychoPy. In the future, you should probably address such questions to the dedicated support forum at https://discourse.psychopy.org rather than here at Stack Overflow.
In essence, your experiment should have a much simpler structure. Embed your two trial routines within a trials loop. After that loop, insert your break routine. Lastly, embed the whole lot within an outer blocks loop. i.e. your experiment will show only three routines and two loops, not the very long structure you currently have. The nested loops means the two trial routines will run on very trial, while the break routine will run only once per block.
The key aspect to controlling the block order is the outer blocks loop. Connect it to a conditions file that looks like this:
condition_file
block_1.csv
block_2.csv
block_3.csv
block_4.csv
block_5.csv
block_6.csv
block_7.csv
And set the loop to be be "random".
In the inner trials loop, put the variable name $condition_file in the conditions file field. So you now will have the order of blocks randomised across your subjects.
The other key aspect you need to learn is to control more of the task using variables contained within each of your conditions files. e.g. you are currently creating a separate routine for each ISI value (e.g. AV500ms and AV750ms). Instead, you should just have a single routine, called say AV. Make the timings of the stimulus components within that routine be controlled by a variable from your conditions file.
A key principle of programming is DRY: Don't Repeat Yourself (and although you aren't directly programming, under the hood, PsychoPy Builder will generate a Python program for you). Creating multiple routines that differ only in one respect is an indicator that things are not being specified optimally. By having only one routine, if you need to alter it in some way, you only have to do it once, rather than repeat it 7 times. The latter approach is very fragile and hard to maintain, and can easily lead to errors.
There is a resource on controlling blocks of trials here:
https://www.psychopy.org/builder/blocksCounterbalance.html

Measuring the execution time of python scripts

I am currently working on an ML NLP project and I want to measure the execution time of certain parts and also potentially predict how long the execution will take. For example, I want to measure the ML training process (including sub-processes like the data preprocessing part). I have been looking online and I have come across different python modules that can measure the execution time of functions (like the time or timeit ones). However, I still haven't found a concrete solution to predict the time it will take for a function to execute. I have thought about running the code several times, save the (data_size, time) values and then use that to extrapolate for future data. I also thought about then updating this estimation with the time it took the run several subparts of a function (like seeing how much of the process was computed, how long it took and then use that to adjust the time left).
However, I am not sure of any of this and I wanted to see if there were better options out there that I wasn't aware of, so if anyone has a better idea, I'd be thankful if you could share it.

Have you looked into using profiling? It should give a detailed breakdown of the function execution times, the number of calls, etc. You will have to execute the script with profiling, and then you will get the detailed breakdown.
https://docs.python.org/3/library/profile.html#module-cProfile
If you want in-time progress reports there are a couple of libraries I've seen. https://pypi.org/project/tqdm/
https://pypi.org/project/progressbar2/
Hope these help!

Auto-memoizing Python interpreter for machine learning(Incpy alternative?)

I am working on a machine learning project in python and many times I found myself rerun some algorithm with different tweaks each time(changing few parameters, different normalization, some extra feature engineering, etc). Each time most computation is similar except a few steps. I can, of course, save some immediate states on disk and load it next time instead of the computing the same thing over and over again.
The thing is that there are so many such immediate results that manually save them and keep a record of them would be a pain. I looked at some python decorator here that can make things a bit easier. However, the problem with this implementation is, that it will always return the same result from the first time you called the function, even when your function has arguments and therefore should produce different results for different arguments. I really need to memorize the output of a function with different arguments.
I googled extensively on this topic and the closest thing that I found that is IncPy by Philip Guo. IncPy (Incremental Python) is an enhanced Python interpreter that speeds up script execution times by automatically memoizing (caching) the results of long-running function calls and then re-using those results rather than re-computing, when safe to do so.
I really like the idea and think it will be very useful for data science and machine learning but the code is written nine years ago for python 2.6 and is no longer maintained.
So my question is that is there any other alternative automatical caching/memorizing techniques in python that can handle relatively large dataset?

Comparing wall time and resource usage across different programming environments

Is there a particular software resource monitor that researchers or academics use to compare execution time and other resource usage metrics between programming environments? For instance, if I have a routine in C++, python and another in Matlab, that are all identical in function and similar implantations -how would I make an objective, measurable result comparison as to which was the most efficient process. Likewise is it a tool that could also analyze performance between versions of the same code to track improvements in processing efficiency. Please try to answer this question without generalizations like "oh, C++ is always more efficient than python and python will always be more efficient than Matlab."

The correct way is to write tests. Get current time before actual algo starts, and get current time after it ends. There are ways to do that in c++, python and matlab
You must not think of results as they are 100% precision because of system scheduling process etc, though it is a good way to compare before-after results.
Good way to get more precision results is to run your code multiple times.

Realtime art project --- input: sound --- output: image (better title?)

I am not quite sure if I should ask this question here.
I want to make an art project.
I want to use voice as an input, and an image as output.
The image changes accordingly to the sound.
How can I realise this? Because I need realtime or a delay under 50 ms.
At first I thougt it would be better to use a micro controller.
But I want to calculate huge images, maybe I microcontroller can't do this.
For example I want to calculate 10.000 moving objects.
Could I realise this with windows/linux/mircocontroller?
It would be very good if I could use Python.
Or do you thing processing is a better choice?
Do you need more details?

Have you thought about using a graphical dataflow environment like Pure Data (Pd) or Max? Max is a commercial product, but Pd is free.
Even if you don't end up using Pd for your final project, it makes an excellent rapid prototyping tool. Whilst the graphics processing capabilities of Pd are limited, there are extensions such as Gridflow and Gem, which may help you. Certainly with Pd you can analyse incoming sound using the [fiddle~] object, which will give you the overall pitch and frequency/amplitude of individual partials and [env~], which will give you RMS amplitude. You could then very easily map changes in sound (pitch, amplitude, timbre) to various properties of an image such as colour, shapes, number of elements and so on in Gem or Gridflow.
10k moving objects sounds like a heck of a lot even on a modern desktop GPU! Calculating all of those positions on-the-fly is going to consume a lot of resources. I think even with a dedicated C++ graphics library like openFrameworks, this might be a struggle. You might want to consider an optimisation strategy like pre-rendering aspects of the image, and using the real-time audio control to determine which pre-rendered components are displayed at any given time. This might give the illusion of control over 10k objects, when in reality much of it is pre-rendered.
Good luck!

The above answer is a good one, PD is very flexible, but if you want something more code oriented and better suited to mixing with MCUs, processing might be better.
Another good way to do it would be to use Csound with the Csound Python API. Csound has a steep learning curve, but tons tons of audio analysis functionality and it's very good for running with low latency. You could definitely analyse an input signal in real time and then send control values out to a graphic environment, scripting both with Python.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.