increment values in a numpy array multiple times [duplicate] - python
Simple Version:
if I do this:
import numpy as np
a = np.zeros(2)
a[[1, 1]] += np.array([1, 1])
I get [0, 1] as an output. but I would like [0, 2]. Is that possible somehow, using implicit numpy looping instead of looping over it myself?
What-I-actually-need-to-do version:
I have a structured array that contains an index, a value, and some boolean value. I would like to sum those values at those indices, based on the boolean. Clearly that can be done with a simple loop, but it seems like it should be possible with clever numpy indexing (as above).
For example, I have an array with 5 elements that I want to populate from the array with values, indices, and conditions:
import numpy as np
size = 5
nvalues = 10
np.random.seed(1)
a = np.zeros(nvalues, dtype=[('val', float), ('ix', int), ('cond', bool)])
a = np.rec.array(a)
a.val = np.random.rand(nvalues)
a.cond = (np.random.rand(nvalues) > 0.3)
a.ix = np.random.randint(size, size=nvalues)
# obvious solution
obvssum = np.zeros(size)
for i in a:
if i.cond:
obvssum[i.ix] += i.val
# is something this possible?
doesntwork = np.zeros(size)
doesntwork[a[a.cond].ix] += a[a.cond].val
print(doesntwork)
print(obvssum)
Output:
[ 0. 0. 0.61927097 0.02592623 0.29965467]
[ 0. 0. 1.05459336 0.02592623 1.27063303]
I think what's happening here is if a[a.cond].ix were guaranteed to be unique, my method would work just fine, as noted in the simple example.
This is what the at method of NumPy ufuncs is for:
output = numpy.zeros(size)
numpy.add.at(output, a[a.cond].ix, a[a.cond].val)
Related
Using permuted list to populate two-dimensional array with special order
I want to create a 2D array containing tuples or lists which requires a particular order. Using itertools.product I am capable of creating the required permutations: import itertools import numpy as np elements = 2 n = 3 temp = [] for tuples in itertools.product(np.arange(elements,-1,-1), repeat=n): if sum(tuples) == elements: temp.append(tuples) print temp This will print: Out[1277]: array([[2,0,0], [1,1,0], [1,0,1], [0,2,0], [0,1,1], [0,0,2]]) The array should then be created to yield: array = [[(2,0,0),(1,1,0),(0,2,0)], [(1,0,1),(0,1,1),(0,0,0)], [(0,0,2),(0,0,0),(0,0,0)]] and is subsequently used to compute a dot product: array2 = [1,5,10] np.dot(array, array2) Out[1278]: array([2,6,10,11,15,0,20,0,0]) However, itertools does not yield the order I am looking for. Therefore, I am using argsort and basically 1D arrays in the end: array = itertools.product(np.arange(elements,-1,-1), repeat=n) sortedArray = array[array[:,1].argsort()] print sortedArray Out[1279]: array([[2,0,0], [1,0,1], [0,0,2], [1,1,0], [0,1,1], [0,2,0]]) result = np.dot(sortedList, array2) This works fine in combination with np.pad to restore the original size (3x3 = 9): np.pad(result, (0, array.size - result.size), "constant") Out[1280]: array([2,6,10,11,15,20,0,0,0]) However, the order is not retained. The reason for doing this, is a second reference array that uses the same structure as the array above, which can be raveled: reference = [[foo,bar,baz], [bar,bar,0], [foo, 0, 0]] np.ravel(reference) Out[1281]: array([foo,bar,baz,bar,bar,0,foo,0,0]) I am looking for a solution that does not need the work-around.
Python: check if every element in a list exists a specific time
I have the following question. I have a list of ranges like this: parameterRanges2 = [(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)] And I have a numpy array like this : arr = np.array([[2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0], [4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0], [2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0], [6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0], [8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0], [11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0], [1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0], [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0]]) Every parameterRange tuple in the list corresponds with the subarray in the numpy array. Is there a way to check if all elements in the corresponding range exists at least one time? so for example that in the first sublist in the numpy array all numbers 1,2,3,4,5 exists at least one time, in the second sublists exists one time and in the third list for example the numbers 1,2,3,4,5,6,7 exists one time and so on.
Exploiting that the ranges are integer we can give an O(nm) solution, nxm being the shape of arr. The algo works as follows: discard all non-int elements and all that are outside their range use np.add.at to efficiently (O(mn)) generate bincounts for in-range numbers count the above threshold bins in each row and compare to the range . import numpy as np parameterRanges2 = np.array([(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)]) arr = np.array([[2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0], [4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0], [2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0], [6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0], [8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0], [11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0], [1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0], [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0]]) min_occ = 2 dp = np.diff(parameterRanges2, axis=-1) m = np.zeros((parameterRanges2.shape[0], np.max(dp) + 2), dtype=int) arr = arr - parameterRanges2[:, :1] ia = arr.astype(int) idx = np.where((arr==ia) & (ia>=0) & (ia<=dp), ia, -1) np.add.at(m, (np.arange(parameterRanges2.shape[0])[:, None], idx), 1) res = (m[:, :-1] >= min_occ).sum(axis=-1) == dp.ravel() + 1 print(res) Output: [ True True False True False False True True]
There may be a more efficient way just using Numpy functions, but the code below works. I can't think of a simple Numpy way to do it since we can't make a standard Numpy array of ranges, since all the row aren't the same length. import numpy as np arr = np.array([ [2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0,1.0,3.0,2.0,4.0,2.0,4.0,3.0,5.0], [4.0,2.0,3.0,4.0,2.0,4.0,5.0,1.0,2.0,4.0,1.0,3.0,4.0,2.0,3.0,5.0,1.0,3.0,4.0,2.0], [2.0,3.0,4.0,6.0,7.0,1.0,2.0,3.0,5.0,6.0,1.0,2.0,4.0,5.0,6.0,2.0,3.0,4.0,5.0,6.0], [6.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,2.0,2.0,3.0,4.0,5.0], [8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0,7.0,7.0,8.0,9.0,0.0,1.0,2.0,3.0,4.0,5.0,6.0], [11.0,13.0,14.0,16.0,17.0,19.0,1.0,3.0,4.0,6.0,7.0,9.0,10.0,11.0,13.0,14.0,16.0,17.0,19.0,1.0], [1.0,1.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,2.0,3.0,3.0,3.0,3.0,1.0,1.0,1.0,1.0,1.0,2.0], [0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,1.0,0.0,0.0] ]) parameterRanges2 = [(1,5),(1,5),(1,7),(1,7),(0,10),(1,20),(1,3),(0,1)] ranges = [np.arange(u, v+1, dtype='float64') for u, v in parameterRanges2] print([np.all(np.isin(u,v)) for u, v in zip(ranges, arr)]) output [True, True, True, True, False, False, True, True]
Creating index array in numpy - eliminating double for loop
I have some physical simulation code, written in python and using numpy/scipy. Profiling the code shows that 38% of the CPU time is spent in a single doubly nested for loop - this seems excessive, so I've been trying to cut it down. The goal of the loop is to create an array of indices, showing which elements of a 1D array the elements of a 2D array are equal to. indices[i,j] = where(1D_array == 2D_array[i,j]) As an example, if 1D_array = [7.2, 2.5, 3.9] and 2D_array = [[7.2, 2.5] [3.9, 7.2]] We should have indices = [[0, 1] [2, 0]] I currently have this implemented as for i in range(ni): for j in range(nj): out[i, j] = (1D_array - 2D_array[i, j]).argmin() The argmin is needed as I'm dealing with floating point numbers, and so the equality is not necessarily exact. I know that every number in the 1D array is unique, and that every element in the 2D array has a match, so this approach gives the correct result. Is there any way of eliminating the double for loop? Note: I need the index array to perform the following operation: f = complex_function(1D_array) output = f[indices] This is faster than the alternative, as the 2D array has a size of NxN compared with 1xN for the 1D array, and the 2D array has many repeated values. If anyone can suggest a different way of arriving at the same output without going through an index array, that could also be a solution
In pure Python you can do this using a dictionary in O(N) time, the only time penalty is going to be the Python loop involved: >>> arr1 = np.array([7.2, 2.5, 3.9]) >>> arr2 = np.array([[7.2, 2.5], [3.9, 7.2]]) >>> indices = dict(np.hstack((arr1[:, None], np.arange(3)[:, None]))) >>> np.fromiter((indices[item] for item in arr2.ravel()), dtype=arr2.dtype).reshape(arr2.shape) array([[ 0., 1.], [ 2., 0.]])
The dictionary method that some others have suggest might work, but it requires that you know ahead of time that every element in your target array (the 2d array) has an exact match in your search array (your 1d array). Even when this should be true in principle, you still have to deal with floating point precision issues, for example try this .1 * 3 == .3. Another approach is to use numpy's searchsorted function. searchsorted takes a sorted 1d search array and any traget array then finds the closest elements in the search array for every item in the target array. I've adapted this answer for your situation, take a look at it for a description of how the find_closest function works. import numpy as np def find_closest(A, target): order = A.argsort() A = A[order] idx = A.searchsorted(target) idx = np.clip(idx, 1, len(A)-1) left = A[idx-1] right = A[idx] idx -= target - left < right - target return order[idx] array1d = np.array([7.2, 2.5, 3.9]) array2d = np.array([[7.2, 2.5], [3.9, 7.2]]) indices = find_closest(array1d, array2d) print(indices) # [[0 1] # [2 0]]
To get rid of the two Python for loops, you can do all of the equality comparisons "in one go" by adding new axes to the arrays (making them broadcastable with each other). Bear in mind that this produces a new array containing len(arr1)*len(arr2) values. If this is a very big number, this approach could be infeasible depending on the limitations of your memory. Otherwise, it should be reasonably quick: >>> (arr1[:,np.newaxis] == arr2[:,np.newaxis]).argmax(axis=1) array([[0, 1], [2, 0]], dtype=int32) If you need to get the index of the closest matching value in arr1 instead, use: np.abs(arr1[:,np.newaxis] - arr2[:,np.newaxis]).argmin(axis=1)
Compare two numpy arrays and insert nans
I have two numpy arrays with the dimensions (120, 360), one of the arrays consists of integers and zeros the second consists of floats. I want to replace the values of the second array with nans everywhere there is an integer in the first array. Is there an easy and efficient way to do this? Also I'd like to replace the integers in the first array with nans and change zeros to ones. Thanks in advance.
You can achieve this easily with logical indexing into the array, arr2[ arr1 != 0 ] = numpy.NaN However integer arrays don't support NaNs so you'd have to convert your first array to a float array, i.e. arr1 = arr1.astype(float) arr1[arr1 != 0.0] = numpy.NaN arr1[arr1 == 0.0] = 1.0
Setup the arrays: >>> import numpy as np >>> x = np.array([[1,0],[0,4]], dtype=int) >>> y = np.array([[1.1, 2.2],[3.3, 4.4]], dtype=float) You can easily set the second array to nan where you want, like this: >>> y[x != 0] = np.nan >>> y array([[ nan, 2.2], [ 3.3, nan]]) Then convert the first array to floats (since NaN is not an integer) and set the values you want: >>> x = x.astype(float) >>> x[x != 0] = np.nan >>> x[x == 0] = 1 >>> x array([[ nan, 1.], [ 1., nan]])
As a comment on the previous answers, I don't think comparing floats with == is that a good idea, and I think some operations are wasted. What about creating a temporary array mask = (X != 0) and use it as index ? >>> X = X.astype(float) >>> X[~mask] = np.nan >>> X[mask] = 1
I don't know your purpose of replacing values with NaNs, but you may want to consider using numpy's masked arrays instead (similar to Pierre's answer, but numpy has builtin mask support!): import numpy.ma # mask out values when there is a non-zero integer in arr1 arr2 = numpy.ma.masked_array(arr2, mask=arr1) # mask out values in arr2 for non-zero integers, and set all remaining values (the zeros) to 1 arr1 = numpy.ma(arr1, mask=(arr1 != 0)) arr1[~arr1.mask] = 1 No integer to float conversion needed, and this allows you to use a lot of numpy's functionality without getting into problems. E.g., calculating the mean of an array with NaNs is certainly a bad idea, with a masked array, this is no problem.
How do I remove all zero elements from a NumPy array?
I have a rank-1 numpy.array of which I want to make a boxplot. However, I want to exclude all values equal to zero in the array. Currently, I solved this by looping the array and copy the value to a new array if not equal to zero. However, as the array consists of 86 000 000 values and I have to do this multiple times, this takes a lot of patience. Is there a more intelligent way to do this?
For a NumPy array a, you can use a[a != 0] to extract the values not equal to zero.
This is a case where you want to use masked arrays, it keeps the shape of your array and it is automatically recognized by all numpy and matplotlib functions. X = np.random.randn(1e3, 5) X[np.abs(X)< .1]= 0 # some zeros X = np.ma.masked_equal(X,0) plt.boxplot(X) #masked values are not plotted #other functionalities of masked arrays X.compressed() # get normal array with masked values removed X.mask # get a boolean array of the mask X.mean() # it automatically discards masked values
I decided to compare the runtime of the different approaches mentioned here. I've used my library simple_benchmark for this. The boolean indexing with array[array != 0] seems to be the fastest (and shortest) solution. For smaller arrays the MaskedArray approach is very slow compared to the other approaches however is as fast as the boolean indexing approach. However for moderately sized arrays there is not much difference between them. Here is the code I've used: from simple_benchmark import BenchmarkBuilder import numpy as np bench = BenchmarkBuilder() #bench.add_function() def boolean_indexing(arr): return arr[arr != 0] #bench.add_function() def integer_indexing_nonzero(arr): return arr[np.nonzero(arr)] #bench.add_function() def integer_indexing_where(arr): return arr[np.where(arr != 0)] #bench.add_function() def masked_array(arr): return np.ma.masked_equal(arr, 0) #bench.add_arguments('array size') def argument_provider(): for exp in range(3, 25): size = 2**exp arr = np.random.random(size) arr[arr < 0.1] = 0 # add some zeros yield size, arr r = bench.run() r.plot()
You can index with a Boolean array. For a NumPy array A: res = A[A != 0] You can use Boolean array indexing as above, bool type conversion, np.nonzero, or np.where. Here's some performance benchmarking: # Python 3.7, NumPy 1.14.3 np.random.seed(0) A = np.random.randint(0, 5, 10**8) %timeit A[A != 0] # 768 ms %timeit A[A.astype(bool)] # 781 ms %timeit A[np.nonzero(A)] # 1.49 s %timeit A[np.where(A)] # 1.58 s
I would like to suggest you to simply utilize NaN for cases like this, where you'll like to ignore some values, but still want to keep the procedure statistical as meaningful as possible. So In []: X= randn(1e3, 5) In []: X[abs(X)< .1]= NaN In []: isnan(X).sum(0) Out[: array([82, 84, 71, 81, 73]) In []: boxplot(X)
A simple line of code can get you an array that excludes all '0' values: np.argwhere(*array*) example: import numpy as np array = [0, 1, 0, 3, 4, 5, 0] array2 = np.argwhere(array) print array2 [1, 3, 4, 5]
[i for i in Array if i != 0.0] if the numbers are float or [i for i in SICER if i != 0] if the numbers are int.