Vectorize a lookup between two numbers - python

I am trying to find a way to vectorize the following for loop using numpy. This for loop is making my code really drag. The problem that I am having is that I need to look up a value sitting in the dictionary, d, based on the index where the value,val, falls in a range within the array, row.
for i in range(len(row)-1):
if row[i]<val<=row[i+1]:
return d[i]*row[-1]
I would imagine that I could use np.where and np.logical_and to get between two numbers in the array, but then I need the index to grab the value from a dictionary, and that is the part that I just can't seem to figure out without the loop.

Thanks to Divakar's comment, I think that the right answer is to replace the entire for-loop with this numpy monstrosity:
np.vectorize(d1.get)((np.searchsorted(row[:-1],vals,side='left'))-1)*row[-1]

Related

Is there a faster Numpy function to do an add.at, but with two sets of indices?

I'm trying to do an operation that is pretty similar to a numpy.add.at, but with two pairs of indices, and I'm wondering if there's a faster way to do this with numpy or something else rather than a for loop, which is running pretty slowly.
The following works, but I'm trying to do it faster:
for x,y in indices:
A[B[x,y]] += C[x,y]
where the values obtained for B[x,y] will have a lot of duplicates, so B[1,1] may be equal to B[1,2]
numpy.add.at(A, indices, C) is pretty close, but doesn't get me there, as B basically maps the indices into another space. I'm hoping there's a faster way to do this with numpy or something else, probably without an explicit loop.

Fastest way to calculate new list from empty list in python

I want to perform calculations on a list and assign this to a second list, but I want to do this in the most efficient way possible as I'll be using a lot of data. What is the best way to do this? My current version uses append:
f=time_series_data
output=[]
for i, f in enumerate(time_series_data):
if f > x:
output.append(calculation with f)
etc etc
should I use append or declare the output list as a list of zeros at the beginning?
Appending the values is not slower compared to other ways possible to accomplish this.
The code looks fine and creating a list of zeroes would not help any further. Although it can create problems as you might not know how many values will pass the condition f > x.
Since you wrote etc etc I am not sure how long or what operations you need to do there. If possible try using list comprehension. That would be a little faster.
You can have a look at below article which compared the speed for list creation using 3 methods, viz, list comprehension, append, pre-initialization.
https://levelup.gitconnected.com/faster-lists-in-python-4c4287502f0a

Numpy mean for big array dataset using For loop Python

I have big dataset in array form and its arranged like this:
Rainfal amount arranged in array form
Average or mean mean for each latitude and longitude at axis=0 is computed using this method declaration:
Lat=data[:,0]
Lon=data[:,1]
rain1=data[:,2]
rain2=data[:,3]
--
rain44=data[:,44]
rainT=[rain1,rain2,rain3,rain4,....rain44]
mean=np.mean(rainT)
The result was aweseome but requires time computation and I look forward to use For Loop to ease the calculation. As for the moment the script that I used is like this:
mean=[]
lat=data[:,0]
lon=data[:,1]
for x in range(2,46):
rainT=data[:,x]
mean=np.mean(rainT,axis=0)
print mean
But weird result is appeared. Anyone?
First, you probably meant to make the for loop add the subarrays rather than keep replacing rainT with other slices of the subarray. Only the last assignment matters, so the code averages that one subarray rainT=data[:,45], also it doesn't have the correct number of original elements to divide by to compute an average. Both of these mistakes contribute to the weird result.
Second, numpy should be able to average elements faster than a Python for loop can do it since that's just the kind of thing that numpy is designed to do in optimized native code.
Third, your original code copies a bunch of subarrays into a Python List, then asks numpy to average that. You should get much faster results by asking numpy to sum the relevant subarray without making a copy, something like this:
rainT = data[:,2:] # this gets a view onto data[], not a copy
mean = np.mean(rainT)
That computes an average over all the rainfall values, like your original code.
If you want an average for each latitude or some such, you'll need to do it differently. You can average over an array axis, but latitude and longitude aren't axes in your data[].
Thanks friends, you are giving me such aspiration. Here is the working script ideas by #Jerry101 just now but I decided NOT to apply Python Loop. New declaration would be like this:
lat1=data[:,0]
lon1=data[:,1]
rainT=data[:,2:46] ---THIS IS THE STEP THAT I AM MISSING EARLIER
mean=np.mean(rainT,axis=1)*24 - MAKE AVERAGE DAILY RAINFALL BY EACH LAT AND LON
mean2=np.array([lat1,lon1,mean])
mean2=mean2.T
np.savetxt('average-daily-rainfall.dat2',mean2,fmt='%9.3f')
And finally the result is exactly same to program made in Fortran.

Calculate the value of a function for each row in a matrix without iteration through all rows

I'm developing a genetic program and by now the whole algorithm appears to be fine. (Albeit slow...).
I'm iterating through lists of real values, one at a time and then applying a function to the list. The format is something like :
trainingset=[[3.32,55,33,22],[3.322,5,3,223],[23.32,355,33,122]...]]
Where each inner list represents a line in the set and the last item of that list is the result of the regression in that line/individual.
The function I use is some thing like:
def getfitness(individual,set):
...
for elem in set:
apply the function individual to it
fitness=fitness+(set[-1]-(result of individual with the parameters of the set))
fitness=RMS(fitness)
return fitness
So, what I'de like to know is , is there a way of calculating the function in one go, are there any libs that can do this ? I've been looking at matrices in numpy but to no avail.
Thanks in advance.
Jorge

Conditionally add 1 to int element of a NumPy record array

I have a large NumpPy record array 250 million rows by 9 columns (MyLargeRec). and I need to add 1 to the 7th column (dtype = "int") if the the index of that row is in another list or 300,000 integers(MyList). If this was a normal python list I would use the following simple code...
for m in MyList:
MyLargeRec[m][6]+=1
However I can not seem to get a similar functionality using the NumPy record array. I have tried a few options such as nditer, but this will not let me select the specific indices I want.
Now you may say that this is not what NumPy was designed for, so let me explain why I a using this format. I am using it is because it only takes 30 mins to build the record array from scratch whereas it takes over 24 hours if using a conventional 2D list format. I spent all of yesterday trying to find a way to do this and could not, I eventually converted it to a list using...
MyLargeList = list(MyLargeRec)
so I could use the simple code above to achieve what I want, however this took 8.5 hours to perform this function.
Therefore, can anyone tell me first, is there a method to achieve what i want within a NumPy record array? and second, if not, any ideas on the best methods within python 2.7 to create, update and store such a large 2D matrix?
Many thanks
Tom
your_array[index_list, 6] += 1
Numpy allows you to construct some pretty neat slices. This selects the 6th column of all rows in your list of indices and adds 1 to each. (Note that if an index appears multiple times in your list of indices, this will still only add 1 to the corresponding cell.)
This code...
for m in MyList:
MyLargeRec[m][6]+=1
does actually work, silly question by me.

Categories