Old-school c programmer trying to get with the times and learn Python. Struggling to see how to use vectorization effectively to replace for loops. I get the basic concept that Python can do mathematical functions on entire matricies in a single statement, and that's really cool. But I seldom work with mathematical relationships. Almost all my for loops apply CONDITIONAL logic.
Here's a very simple example to illustrate the concept:
import numpy as np
# Initial values
default = [1,2,3,4,5,6,7,8]
# Override values should only replace initial values when not nan
override = [np.nan,np.nan,3.5,np.nan,5.6,6.7,np.nan,8.95]
# I wish I knew how to replace this for loop with a single line of vectorized code
for i in range(len(default)):
if(np.isnan(override[i])==False): #Only override when override value is other than nan
default[i]=override[i]
default
I have a feeling that for loop could be eliminated with a single python statement that only overwrites values of default with values of override that are not np.nan. But I can't see how to do it.
This is just a simplified example to illustrate the concept. My real question is whether or not vectorization is generally useful to replace for loops with conditional logic, or if it's only applicable to mathematical relationships, where the benefits and method of achieving them are obvious. All of my real code challenges are much more complex and the conditional logic is more complex than just a simple "only use this value if it's non-nan".
I found hundreds of articles online about how to use vectorization in Python, but they all seem to focus on replacing mathematical calculations in for loops. All my for loops involve conditional logic. Can vectorization help me or am I trying to fit a square peg in a round hole?
Thanks!
First thing's first, the vectorized version:
override_is_not_nan = np.logical_not(np.isnan(override))
np.where(override_is_not_nan, override, default)
As for your real question, vectorization is useful for multiprocessing.
And not just for multi-core CPUs.
Considering today's GPUs have thousands of cores, using tensors with similar code can make it run much faster.
How much faster? That depends on your data, implementation and hardware.
Evidently, the combination of vectorization with GPUs is part of what enabled the huge progress in the field of Deep Learning.
List comprehension is usually the preferred one line alternative to for loops in Python. It is possible to throw in a conditional into the comprehension as well.
In this specific case we iterate over elements of default and override by zipping them together and replace values of default according to the conditional check.
>>> [y if not(np.isnan(y)) else x for (x,y) in zip(default, override)]
[1, 2, 3.5, 4, 5.6, 6.7, 7, 8.95]
To answer your broader question about vectorization and speedups, the answer unfortunately is it depends. There are situations where a simple for loop performs better than its vectorized counterparts. List comprehensions for example, is just for improving the readability of code as opposed to providing a serious speedup.
The answers on this question address this in more detail.
First find the indices where the non-nan values are located.
Replace the values as indices in the default array with override array values.
import numpy as np
np_default = np.array(default).asdtype(float) # Convert np_default to numpy array with float values
non_nan_indices = np.where(~np.isnan(override)) # Get non nan indices
np_default[non_nan_indices] = np.array(override)[non_nan_indices] # Replacing the values at non-nan indices
np_default # Returns array([1. , 2. , 3.5 , 4. , 5.6 , 6.7 , 7. , 8.95])
Vectorization is where numpy comes to your help it takes the advantage of typed natured of array which result in much faster operations. See [BlogPost] for detail.
Related
I have used
scipy.signal.lfilter(coefficient, 1, input, axis=0)
for filtering a signal in python with 9830000 samples (I have to use axis=0 to get similar answer to matlab), compere to matlab
filter(coefficient, 1, input)(seconds),
it takes very long time (minus) and it becomes worse when I have to filtering several of time. is there any suggestion for that? I have tried numpy as well same timing issue. Is there any other filter that I can received similar answers?
You can use ifilter from itertools to implement a faster filter.
The problem with the build-in function filter from Python is that it returns a list that is memory-consuming when you are dealing with lots of data. Therefore, itertools.ifilter is a better option because it calls the function only when needed.
Some other ideas depending on your implementation are:
If you are designing an FIR filter, you can apply convolve.
Or you can use correlation, to get faster results, because it uses FFT.
I am just learning to use dask and read many threads on this forum related to Dask and for loops. But I am still unclear how to apply those solutions to my problem. I am working with climate data that are functions of (time, depth, location). The 'location' coordinate is a linear index such that each value corresponds to a unique (longitude, latitude). I am showing below a basic skeleton of what I am trying to do, assuming var1 and var2 are two input variables. I want to parallelize over the location parameter 'nxy', as my calculations can proceed simultaneously at different locations.
for loc in range(0,nxy): # nxy = total no. of locations
for it in range(0,ntimes):
out1 = expression1 involving ( var1(loc), var2(it,loc) )
out2 = expression2 involving ( var1(loc), var2(it,loc) )
# <a dozen more output variables>
My questions:
(i) Many examples illustrating the use of 'delayed' show something like "delayed(function)(arg)". In my case, I don't have too many (if any) functions, but lots of expressions. If 'delayed' only operates at the level of functions, should I convert each expression into a function and add a 'delayed' in front?
(ii) Should I wrap the entire for loop shown above inside a function and then call that function using 'delayed'? I tried doing something like this but might not be doing it correctly as I did not get any speed-up compared to without using dask. Here's what I did:
def test_dask(n):
for loc in range(0,n):
# same code as before
return var1 # just returning one variable for now
var1=delayed(tast_dask)(nxy)
var1.compute()
Thanks for your help.
Every delayed task adds about 1ms of overhead. So if your expression is slow (maybe you're calling out to some other expensive function), then yes dask.delayed might be a good fit. If not, then you should probably look elsewhere.
In particular, it looks like you're just iterating through a couple arrays and operating element by element. Please be warned that Python is very slow at this. You might want to not use Dask at all, but instead try one of the following approaches:
Find some clever way to rewrite your computation with Numpy expressions
Use Numba
Also, given the terms your using like lat/lon/depth, it may be that Xarray is a good project for you.
I am trying to use pandas pd.DataFrame.where as follows:
df.where(cond=mask, other=df.applymap(f))
Where f is a user defined function to operate on a single cell. I cannot use other=f as it seems to produce a different result.
So basically I want to evaluate the function f at all cells of the DataFrame which does not satisfy some condition which I am given as the mask.
The above usage using where is not very efficient as it evaluates f immediately for the entire DataFrame df, whereas I only need to evaluate it at some entries of the DataFrame, which can sometimes be very few specific entries compared to the entire DataFrame.
Is there an alternative usage/approach that could be more efficient in solving this general case?
As you correctly stated, df.applymap(f) is evaluated before df.where(). I'm fairly certain that df.where() is a quick function and is not the bottleneck here.
It's more likely that df.applymap(f) is inefficient, and there's usually a faster way of doing f in a vectorized manner. Having said so, if you do believe this is impossible, and f is itself slow, you could modify f to leave the input unchanged wherever your mask is False. This is most likely going to be really slow though, and you'll definitely prefer trying to vectorize f instead.
If you really must do it element-wise, you could use a NumPy array:
result = df.values
for (i,j) in np.where(mask):
result[i,j] = f(result[i,j])
It's critical that you use a NumPy array for this, rather than .iloc or .loc in the dataframe, because indexing a pandas dataframe is slow.
You could compare the speed of this with .applymap; for the same operation, I don't think .applymap is substantially faster (if at all) than simply a for loop, because all pandas does is run a for loop of its own in Python (maybe Cython? But even that only saves on the overhead, and not the function itself). This is different from 'proper' vectorization, because vector operations are implemented in C.
I wrote a program using normal Python, and I now think it would be a lot better to use numpy instead of standard lists. The problem is there are a number of things where I'm confused how to use numpy, or whether I can use it at all.
In general how do np.arrays work? Are they dynamic in size like a C++ vector or do I have declare their length and type beforehand like a standard C++ array? In my program I've got a lot of cases where I create a list
ex_list = [] and then cycle through something and append to it ex_list.append(some_lst). Can I do something like with a numpy array? What if I knew the size of ex_list, could I declare and empty one and then add to it?
If I can't, let's say I only call this list, would it be worth it to convert it to numpy afterwards, i.e. is calling a numpy list faster?
Can I do more complicated operations for each element using a numpy array (not just adding 5 to each etc), example below.
full_pallete = [(int(1+i*(255/127.5)),0,0) for i in range(0,128)]
full_pallete += [col for col in right_palette if col[1]!=0 or col[2]!=0 or col==(0,0,0)]
In other words, does it make sense to convert to a numpy array and then cycle through it using something other than for loop?
Numpy arrays can be appended to (see http://docs.scipy.org/doc/numpy/reference/generated/numpy.append.html), although in general calling the append function many times in a loop has a heavy performance cost - it is generally better to pre-allocate a large array and then fill it as necessary. This is because the arrays themselves do have fixed size under the hood, but this is hidden from you in python.
Yes, Numpy is well designed for many operations similar to these. In general, however, you don't want to be looping through numpy arrays (or arrays in general in python) if they are very large. By using inbuilt numpy functions, you basically make use of all sorts of compiled speed up benefits. As an example, rather than looping through and checking each element for a condition, you would use numpy.where().
The real reason to use numpy is to benefit from pre-compiled mathematical functions and data processing utilities on large arrays - both those in the core numpy library as well as many other packages that use them.
I have a numpy script that is currently running quite slowly.
spends the vast majority of it's time performing the following operation inside a loop:
terms=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex,Ey,Ez_av)
res=[np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms]
res=array(res)
Ex[1:Nx-1]=res[1:Nx-1,0]
Ey[1:Nx-1]=res[1:Nx-1,1]
It's the list comprehension that is really slowing this code down.
In this case, Coeff_3, and Coeff_2 are length 1000 lists whose elements are 3x3 numpy matricies, and Ex,Ey,Ez, Curl_x, etc are all length 1000 numpy arrays.
I realize it might be faster if i did things like setting a single 3x1000 E vector, but i have to perform a significant amount of averaging of different E vectors between step, which would make things very unwieldy.
Curiously however, i perform this operation twice per loop (once for Ex,Ey, once for Ez), and performing the same operation for the Ez's takes almost twice as long:
terms2=zip(Coeff_3,Coeff_2,Curl_x,Curl_y,Curl_z,Ex_av,Ey_av,Ez)
res2=array([np.dot(C2,array([C_x,C_y,C_z]))+np.dot(C3,array([ex,ey,ez])) for (C3,C2,C_x,C_y,C_z,ex,ey,ez) in terms2])
Anyone have any idea what's happening? Forgive me if it's anything obvious, i'm very new to python.
As pointed out in previous comments, use array operations. np.hstack(), np.vstack(), np.outer() and np.inner() are useful here. You're code could become something like this (not sure about your dimensions):
Cxyz = np.vstack((Curl_x,Curl_y,Curl_z))
C2xyz = np.dot(C2, Cxyz)
...
Check the shape of your resulting dimensions, to make sure you translated your problem right. Sometimes numexpr can also to speed up such tasks significantly with little extra effort,