I'm using audiomath under Python 3.7 on Windows 10.
I have been using audiomath to standardize my audio files for EEG analyses, it has been very useful for all parameters except this one, I keep stuck when trying to make it create fade-ins, fade-outs or HannWindows.
I've run the same code on other machines with other versions of Python and Numpy and I still get the same error.
from audiomath import Sound, MakeRise
import numpy
sound01 = Sound('mySample.wav')
soundFadedIn = sound01.MakeHannWindow(5)
soundFadedIn.Play()
Log Error
As pointed out by #WarrenWeckesser, this was a bug in audiomath, which has been fixed in audiomath version 1.16.0+
Note that MakeHannWindow only returns the Hann weighting itself (with duration and sampling-frequency matched to sound01). It does not return the sound multiplied by the weighting as you seem to have assumed. What you seem to be trying to do may be better accomplished using the .Fade() method (which was also affected by the same bug).
With a little modification, the way you did it is one way to do it (it always gives you a symmetric fade-in and -out, optionally specifying the duration of the plateau in the middle, in seconds):
from audiomath import Sound
sound01 = Sound('mySample.wav')
soundFadedInAndOut = sound01 * sound01.MakeHannWindow(5) # note the multiplication
Or here's another, where you specify the duration of the rising and falling sections explicitly and separately, instead (it doesn't have to be symmetric, and either of the two durations can be 0):
from audiomath import Sound
sound01 = Sound('mySample.wav')
soundFadedInAndOut = sound01.Copy().Fade(risetime=0.5, falltime=0.5, hann=True)
Finally, if for some reason you're unable or unwilling to upgrade audiomath to 1.16, a workaround for the bug you're reporting might be to use the Shoulder() function from audiomath.Signal to generate your windowing function:
import audiomath as am, numpy as np
x = am.Sound('mySample.wav')
endFadeIn, startFadeOut = 0.5, x.duration-0.5
t = np.linspace(0, x.duration, x.nSamples) # in seconds
window = am.Signal.Shoulder(t, [0, endFadeIn, startFadeOut, x.duration]) # it's a numpy array, not a Sound
faded = x * window # but you can still multiply a Sound by it
faded.Play()
Related
I know Numba does not support all Python features nor all NumPy features.
However I really need to speed up the execution time of the following function, which is block_reduce available in the scikit-image library (I've not downloaded the whole package, I've just taken block_reduce and view_as_blocks from it).
Here is the original code (I've just removed the examples from the docstring).
block_reduce.py
import numpy as np
from numpy.lib.stride_tricks import as_strided
def block_reduce(image, block_size, func=np.sum, cval=0):
"""
Taken from scikit-image to avoid installation (it's very big)
Down-sample image by applying function to local blocks.
Parameters
----------
image : ndarray
N-dimensional input image.
block_size : array_like
Array containing down-sampling integer factor along each axis.
func : callable
Function object which is used to calculate the return value for each
local block. This function must implement an ``axis`` parameter such
as ``numpy.sum`` or ``numpy.min``.
cval : float
Constant padding value if image is not perfectly divisible by the
block size.
Returns
-------
image : ndarray
Down-sampled image with same number of dimensions as input image.
"""
if len(block_size) != image.ndim:
raise ValueError("`block_size` must have the same length "
"as `image.shape`.")
pad_width = []
for i in range(len(block_size)):
if block_size[i] < 1:
raise ValueError("Down-sampling factors must be >= 1. Use "
"`skimage.transform.resize` to up-sample an "
"image.")
if image.shape[i] % block_size[i] != 0:
after_width = block_size[i] - (image.shape[i] % block_size[i])
else:
after_width = 0
pad_width.append((0, after_width))
image = np.pad(image, pad_width=pad_width, mode='constant',
constant_values=cval)
blocked = view_as_blocks(image, block_size)
return func(blocked, axis=tuple(range(image.ndim, blocked.ndim)))
def view_as_blocks(arr_in, block_shape):
"""Block view of the input n-dimensional array (using re-striding).
Blocks are non-overlapping views of the input array.
Parameters
----------
arr_in : ndarray
N-d input array.
block_shape : tuple
The shape of the block. Each dimension must divide evenly into the
corresponding dimensions of `arr_in`.
Returns
-------
arr_out : ndarray
Block view of the input array.
"""
if not isinstance(block_shape, tuple):
raise TypeError('block needs to be a tuple')
block_shape = np.array(block_shape)
if (block_shape <= 0).any():
raise ValueError("'block_shape' elements must be strictly positive")
if block_shape.size != arr_in.ndim:
raise ValueError("'block_shape' must have the same length "
"as 'arr_in.shape'")
arr_shape = np.array(arr_in.shape)
if (arr_shape % block_shape).sum() != 0:
raise ValueError("'block_shape' is not compatible with 'arr_in'")
# -- restride the array to build the block view
new_shape = tuple(arr_shape // block_shape) + tuple(block_shape)
new_strides = tuple(arr_in.strides * block_shape) + arr_in.strides
arr_out = as_strided(arr_in, shape=new_shape, strides=new_strides)
return arr_out
test_block_reduce.py
import numpy as np
import time
from block_reduce import block_reduce
image = np.arange(3*3*1000).reshape(3, 3, 1000)
# DO NOT REPORT THIS... COMPILATION TIME IS INCLUDED IN THE EXECUTION TIME!
start = time.time()
block_reduce(image, block_size=(3, 3, 1), func=np.mean)
end = time.time()
print("Elapsed (with compilation) = %s" % (end - start))
# NOW THE FUNCTION IS COMPILED, RE-TIME IT EXECUTING FROM CACHE
start = time.time()
block_reduce(image, block_size=(3, 3, 1), func=np.mean)
end = time.time()
print("Elapsed (after compilation) = %s" % (end - start))
I went through many issues with this code.
For example Numba does not support function type parameters. But even if I try to work around this problem by using a string for this parameter (for example func would be the string "sum" instead of np.sum) I'll fall into many more issues related to features unsupported by Numba (like np.pad, isinstance, the tuple function, etc.).
Going through each single issue turned out to be very painful. For example, I've tried to incorporate all the code for np.pad from numpy into block_reduce.py and add the numba.jit decorator to np.pad but I got additional problems.
If there is a smart way to use Numba despite all these unsupported features I would be happy with it.
Otherwise is there any alternative to Numba for that? I know there is PyPy which I've never used. If PyPy is a solution for my problem I have to highlight I just need this single script block_reduce.py to run with PyPy. The rest of the project should be run with CPython.
I was also thinking of creating a C module extension, which I've never done. But if it's worth trying I will do.
Have you tried running detailed profiling of your code? If you are dissatisfied with the performance of your program I think it can be very helpful to use a tool such as cProfile or py-spy. This can identify bottlenecks in your program and which parts specifically need to be sped up.
That being said, as #CJR said, if your program is spending the bulk of the compute time in NumPy, there likely is no reason to worry about speeding it up using a just-in-time compiler or similar modifications to your setup. As explained in more detail here, NumPy is fast due to it implementing compute-intensive tasks in compiled languages, so it saves you from worrying about that and abstracts it away.
Depending on what exactly you are planning to do, it is possible that your efficiency could be improved by parallelism, but this is not something I would worry about yet.
To end on a more general note: while optimizing code efficiency is of course very important, it is imperative to do so carefully and deliberately. As Donald Knuth is famous for saying "premature optimization is the root of all evil (or at least most of it) in programming". See this stack exchange thread for some more discussion on this.
I would like to distribute an integer for example 20, into four parts following the probability for each part:p=[0.02,0.5,0.3,0.18]
The corresponding python code is:
frequency=np.random.choice([1,2,3,4],20,p=[0.02,0.5,0.3,0.18])
from collections import Counter
np.fromiter(Counter(frequency).values(), dtype=np.float32)
# Out[86]:
# array([8., 8., 4.], dtype=float32)
However, I have over 1e8~ many parts and the number is not 20 but some 1e10.
So python is really slow.
for example
frequency=np.random.choice([i for i in range (10**7)],16**10,p=[0.0000001 for i in range(10**7)])
from collections import Counter
r=np.fromiter(Counter(frequency).values(), dtype=np.float32)
Now it simply yields MemoryError:
I think tensorflow gpu is able to conquer this issue, since the output result is only of size 10**7.
Does anyone know how to do this?
There are a few issues here to think of.
If you run the code on a GPU, it will never work because GPUs are not made for storage but rather fast computation so the space on the GPU is less than a CPU. However, this code may produce a memory error on a CPU too, as it did on my machine. So we first try to overcome that.
Overcoming the MemoryError on CPU:
The line producing the MemoryError is line 1 itself:
In [1]: frequency = np.random.choice([i for i in range (10**7)],16**10,p=[0.0000
...: 001 for i in range(10**7)])
...:
---------------------------------------------------------------------------
MemoryError Traceback (most recent call last)
The reason for this is that the output of line 1 is not of size 10**7 but 16**10. Since this is what is causing the MemoryError, the goal should be never to create a list of that size.
To do this, we reduce the size of the sample by a factor and loop over the block factor number of times so that it is storable. On my machine, a factor of 1000000 does the trick. Once we have created the sample, we use Counter to turn it into a dictionary of frequencies. The advantage is that we know that the dictionary of frequencies, when converted to a list or numpy array, will never exceed the size of 10**7, which does not give a memory error.
As some of the elements might not be in the sampled array each time, instead of converting the Counter dictionary into a list directly, we will update this dictionary using the dictionary in the previous iteration to preserve frequencies of the specific elements.
Once the whole loop is done, we convert the created dictionary to a list. I have added a progressbar so as to track the progress since the computation might take a lot of time. Also, you don't need to add the parameter p to the np.random.choice() function in your specific case as the distribution is uniform anyway.
import numpy as np
import tensorflow as tf
from click import progressbar
from collections import Counter
def large_uniform_sample_frequencies(factor=1000000, total_elements=10**7, sample_size=16**10):
# Initialising progressbar
bar = range(factor)
# Initialise an empty dictionary which
# will be updated in each iteration
counter_dict = {}
for iteration in bar:
# Generate a random sample of size (16 ** 10) / factor
frequency = np.random.choice([i for i in range (total_elements)],
sample_size / factor)
# Update the frequency dictionary
new_counter = Counter(frequency)
counter_dict.update(new_counter)
return np.fromiter(counter_dict.values(), dtype=np.float32)
Using tensorflow-gpu:
As you have mentioned tensorflow-gpu I can assume you either want to get rid of the MemoryError using tensorflow-gpu or run this in conjunction with tensorflow-gpu while using a GPU.
To solve the MemoryError, you may try the tf.multinomial() function to the same effect as np.random.choice() as shown here, but it is unlikely that it will help overcome the problem, which is storing data of a certain size and not performing some alternate computation.
If you want to run this as part of training some model for instance, you can use Distributed Tensorflow to place this part of the computation graph on the CPU as a PS Task by using the code given above. Here is the final code for that:
# Mention the devices for PS and worker tasks
ps_dev = '/cpu:0'
worker_dev = '/gpu:0'
# Toggle True to place computation on CPU
# and False to place it on the least loaded GPU
is_ps_task = True
# Set device for a PS task
if (is_ps_task):
device_setter = tf.train.replica_device_setter(worker_device=worker_dev,
ps_device=ps_dev,
ps_tasks=1)
# Allocate the computation to CPU
with tf.device(device_setter):
freqs = large_uniform_sample_frequencies()
I am using the anaconda suite with ipython 3.6.1 and their accelerate package. There is a cufft sub-package in this two functions fft and ifft. These, as far as I understand, takes in a numpy array and outputs to a numpy array, both in system ram, i.e. all gpu-memory and transfer between system and gpu memory is handled automatically and gpu memory is releaseed as function is ended. This seems all very nice and seems to work for me. However, I would like to run multiple fft/ifft calls on the same array and for each time extract just one number from the array. It would be nice to keep the array in the gpu memory to minimize system <-> gpu transfer. Am I correct that this is not possible using this package? If so, is there another package that would do the same. I have noticed the reikna project but that doesn't seem available in anaconda.
The thing I am doing (and would like to do efficiently on gpu) is in short shown here using numpy.fft
import math as m
import numpy as np
import numpy.fft as dft
nr = 100
nh = 2**16
h = np.random.rand(nh)*1j
H = np.zeros(nh,dtype='complex64')
h[10] = 1
r = np.zeros(nr,dtype='complex64')
fftscale = m.sqrt(nh)
corr = 0.12j
for i in np.arange(nr):
r[i] = h[10]
H = dft.fft(h,nh)/fftscale
h = dft.ifft(h*corr)*fftscale
r[nr-1] = h[10]
print(r)
Thanks in advance!
So I found Arrayfire which seems rather easy to work with.
Given 2 large arrays of 3D points (I'll call the first "source", and the second "destination"), I needed a function that would return indices from "destination" which matched elements of "source" as its closest, with this limitation: I can only use numpy... So no scipy, pandas, numexpr, cython...
To do this i wrote a function based on the "brute force" answer to this question. I iterate over elements of source, find the closest element from destination and return its index. Due to performance concerns, and again because i can only use numpy, I tried multithreading to speed it up. Here are both threaded and unthreaded functions and how they compare in speed on an 8 core machine.
import timeit
import numpy as np
from numpy.core.umath_tests import inner1d
from multiprocessing.pool import ThreadPool
def threaded(sources, destinations):
# Define worker function
def worker(point):
dlt = (destinations-point) # delta between destinations and given point
d = inner1d(dlt,dlt) # get distances
return np.argmin(d) # return closest index
# Multithread!
p = ThreadPool()
return p.map(worker, sources)
def unthreaded(sources, destinations):
results = []
#for p in sources:
for i in range(len(sources)):
dlt = (destinations-sources[i]) # difference between destinations and given point
d = inner1d(dlt,dlt) # get distances
results.append(np.argmin(d)) # append closest index
return results
# Setup the data
n_destinations = 10000 # 10k random destinations
n_sources = 10000 # 10k random sources
destinations= np.random.rand(n_destinations,3) * 100
sources = np.random.rand(n_sources,3) * 100
#Compare!
print 'threaded: %s'%timeit.Timer(lambda: threaded(sources,destinations)).repeat(1,1)[0]
print 'unthreaded: %s'%timeit.Timer(lambda: unthreaded(sources,destinations)).repeat(1,1)[0]
Retults:
threaded: 0.894030461056
unthreaded: 1.97295164054
Multithreading seems beneficial but I was hoping for more than 2X increase given the real life dataset i deal with are much larger.
All recommendations to improve performance (within the limitations described above) will be greatly appreciated!
Ok, I've been reading Maya documentation on python and I came to these conclusions/guesses:
They're probably using CPython inside (several references to that documentation and not any other).
They're not fond of threads (lots of non-thread safe methods)
Since the above, I'd say it's better to avoid threads. Because of the GIL problem, this is a common problem and there are several ways to do the earlier.
Try to build a tool C/C++ extension. Once that is done, use threads in C/C++. Personally, I'd only try SIP to work, and then move on.
Use multiprocessing. Even if your custom python distribution doesn't include it, you can get to a working version since it's all pure python code. multiprocessing is not affected by the GIL since it spawns separate processes.
The above should've worked out for you. If not, try another parallel tool (after some serious praying).
On a side note, if you're using outside modules, be most mindful of trying to match maya's version. This may have been the reason because you couldn't build scipy. Of course, scipy has a huge codebase and the windows platform is not the most resilient to build stuff.
The following program loads two images with PyGame, converts them to Numpy arrays, and then performs some other Numpy operations (such as FFT) to emit a final result (of a few numbers). The inputs can be large, but at any moment only one or two large objects should be live.
A test image is about 10M pixels, which translates to 10MB once it's greyscaled. It gets converted to a Numpy array of dtype uint8, which after some processing (applying Hamming windows), is an array of dtype float64. Two images are loaded into arrays this way; later FFT steps result in an array of dtype complex128. Prior to adding the excessive gc.collect calls, the program memory size tended to increase with each step. Additionally, it seems most Numpy operations will give a result in the highest precision available.
Running the test (sans the gc.collect calls) on my 1GB Linux machine results in prolonged thrashing, which I have not waited for. I don't yet have detailed memory use stats -- I tried some Python modules and the time command to no avail; now I'm looking into valgrind. Watching PS (and dealing with machine unresponsiveness in the later stages of the test) suggests a maximum memory usage of about 800 MB.
A 10 million cell array of complex128 should occupy 160 MB. Having (ideally) at most two of these live at one time, plus the not-insubstantial Python and Numpy libraries and other paraphernalia, probably means allowing for 500 MB.
I can think of two angles from which to attack the problem:
Discarding intermediate arrays as soon as possible. That's what the gc.collect calls are for -- they seem to have improved the situation, as it now completes with only a few minutes of thrashing ;-). I think one can expect that memory-intensive programming in a language like Python will require some manual intervention.
Using less-precise Numpy arrays at each step. Unfortunately the operations that return arrays, like fft2, do not appear to allow the type to be specified.
So my main question is: is there a way of specifying output precision in Numpy array operations?
More generally, are there other common memory-conserving techniques when using Numpy?
Additionally, does Numpy have a more idiomatic way of freeing array memory? (I imagine this would leave the array object live in Python, but in an unusable state.) Explicit deletion followed by immediate GC feels hacky.
import sys
import numpy
import pygame
import gc
def get_image_data(filename):
im = pygame.image.load(filename)
im2 = im.convert(8)
a = pygame.surfarray.array2d(im2)
hw1 = numpy.hamming(a.shape[0])
hw2 = numpy.hamming(a.shape[1])
a = a.transpose()
a = a*hw1
a = a.transpose()
a = a*hw2
return a
def check():
gc.collect()
print 'check'
def main(args):
pygame.init()
pygame.sndarray.use_arraytype('numpy')
filename1 = args[1]
filename2 = args[2]
im1 = get_image_data(filename1)
im2 = get_image_data(filename2)
check()
out1 = numpy.fft.fft2(im1)
del im1
check()
out2 = numpy.fft.fft2(im2)
del im2
check()
out3 = out1.conjugate() * out2
del out1, out2
check()
correl = numpy.fft.ifft2(out3)
del out3
check()
maxs = correl.argmax()
maxpt = maxs % correl.shape[0], maxs / correl.shape[0]
print correl[maxpt], maxpt, (correl.shape[0] - maxpt[0], correl.shape[1] - maxpt[1])
if __name__ == '__main__':
args = sys.argv
exit(main(args))
This
on SO says "Scipy 0.8 will have single precision support for almost all the fft code",
and SciPy 0.8.0 beta 1 is just out.
(Haven't tried it myself, cowardly.)
if I understand correctly, you are calculating a convolution between two images. The Scipy package contains a dedicated module for that (ndimage), which might be more memory efficient than the "manual" approach via Fourier transforms. It would be good to try using it instead of going through Numpy.