Who can explain how to loop add an element to the numpy array by condition?
I wrote some code that should do add element 2 if i element of array A is 0 and add element 1 if i element of array A is not 0.
Here is the code itself:
import numpy as np
def finalconcat(somearray):
for i in somearray:
arraysome=[]
if somearray[i]==0:
arraysome=np.append(arraysome,[2],axis=0)
else:
arraysome=np.append(arraysome,[1],axis=0)
return arraysome
Let me give you an example:
A=np.array([1,0,2,3,4,5])
C=finalconcat(B)
print(C)
It should come out:
[1,2,1,1,1,1]
But it comes out like:
[1.]
Please explain to me what is wrong here, I just don't understand what could be wrong...
You have several issues:
arraysome=[] is inside your loop so for each iteration of somearray you are emptying arraysome. Consequently, you can never end up with more than one element in arraysome when you are all done.
You have for i in somearray. On each iteration i will be the next element of somearray; it will not be iterating indices of the array. Yet later you have if somearray[i]==0:. This should just be if i==0:.
If you want the resulting elements of arraysome to be integers rather than floats, then you should initialize it to be a an numpy array of integers.
You have C=finalconcat(B), but B is not defined.
You should really spend some time reading the PEP 8 – Style Guide for Python Code.
import numpy as np
def finalconcat(somearray):
arraysome = np.array([], dtype=np.int)
for i in somearray:
if i == 0:
arraysome = np.append(arraysome, [2], axis=0)
else:
arraysome = np.append(arraysome, [1], axis=0)
return arraysome
a = np.array([1, 0, 2, 3, 4, 5])
c = finalconcat(a)
print(c)
Prints:
[1 2 1 1 1 1]
For iteration like this it's better to use lists. np.append is just a poorly named cover for np.concatenate, which returns a whole new array with each call. List append works in-place, and is more efficient. And easier to use:
def finalconcat(somearray):
rec = [2 if i==0 else 1 for i in somearray]
# arr = np.array(rec)
return rec
In [31]: a = np.array([1, 0, 2, 3, 4, 5])
In [32]: np.array([2 if i==0 else 1 for i in a])
Out[32]: array([1, 2, 1, 1, 1, 1])
But it's better to use whole-array methods, such as:
In [33]: b = np.ones_like(a)
In [34]: b
Out[34]: array([1, 1, 1, 1, 1, 1])
In [35]: b[a==0] = 2
In [36]: b
Out[36]: array([1, 2, 1, 1, 1, 1])
or
In [37]: np.where(a==0, 2, 1)
Out[37]: array([1, 2, 1, 1, 1, 1])
Related
Say I have two arrays:
a = np.asarray([0,1,2])
b = np.asarray([3,7,10])
Is there a fast way to create:
c = np.asarray([0,0,0,1,1,1,1,2,2,2])
# index 3 7 10
This can be done using a for loop but I wonder if there is a fast internal numpy function that achieves the same thing.
You can use diff to get the successive differences, r_ to add the first b value and repeat to duplicate the values:
a = np.asarray([0, 1, 2])
b = np.asarray([3, 7, 10])
c = np.repeat(a, np.r_[b[0], np.diff(b)])
Output: array([0, 0, 0, 1, 1, 1, 1, 2, 2, 2])
I want to create 2d tensor (or numpy array, doesn't really matter), where every row will be cycle shifted first row. I do it using for loop:
import torch
import numpy as np
a = np.random.rand(33, 11)
miss_size = 64
lp_order = a.shape[1] - 1
inv_a = -np.flip(a, axis=1)
mtx_size = miss_size+lp_order # some constant
mtx_row = torch.cat((torch.from_numpy(inv_a), torch.zeros((a.shape[0], miss_size - 1 + a.shape[1]))), dim=1)
mtx_full = mtx_row.unsqueeze(1)
for i in range(mtx_size):
mtx_row = torch.roll(mtx_row, 1, 1)
mtx_full = torch.cat((mtx_full, mtx_row.unsqueeze(1)), dim=1)
unsqueezing is needed because I stack 2d tensors into 3d tensor
Is there more efficient way to do that? Maybe linear algebra trick or more pythonic approach.
You can use scipy.linalg.circulant():
scipy.linalg.circulant([1, 2, 3])
# array([[1, 3, 2],
# [2, 1, 3],
# [3, 2, 1]])
I believe you can achieve this using torch.gather by constructing the appropriate index tensor. This approach works with batches too.
If we take this approach, the objective is to construct an index tensor where each value refers to an index in mtx_row (along the last dimension here dim=1). In this case, it would be shaped (3, 3):
tensor([[0, 1, 2],
[2, 0, 1],
[1, 2, 0]])
You can achieve this by broadcasting torch.arange with its own transpose and applying modulo on the resulting matrix:
>>> idx = (n-torch.arange(n)[None].T + torch.arange(n)[None]) % n
tensor([[0, 1, 2],
[2, 0, 1],
[1, 2, 0]])
Let mtx_row be shaped (2, 3):
>>> mtx_row
tensor([[0.3523, 0.0170, 0.1875],
[0.2156, 0.7773, 0.4563]])
From there you need to idx and mtx_row so they have the same shapes:
>>> idx_ = idx[None].expand(len(mtx_row), -1, -1)
>>> val_ = mtx_row[:, None].expand(-1, n, -1)
Then we can apply torch.gather on the last dimension dim=2:
>>> val_.gather(-1, idx_)
tensor([[[0.3523, 0.0170, 0.1875],
[0.1875, 0.3523, 0.0170],
[0.0170, 0.1875, 0.3523]],
[[0.2156, 0.7773, 0.4563],
[0.4563, 0.2156, 0.7773],
[0.7773, 0.4563, 0.2156]]])
My goal is to take a list of np.arrays and create an associated list or array that classifies each as having a duplicate or not. Here's what I thought would work:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
count_dict = dict(zip(uniques, counts))
[count_dict[i] for i in www]
The desired output for this case would be :
[1, 1, 0]
because the first and second element have another copy within the original list. It seems that the problem is that I cannot use a np.array as a key for a dictionary.
Suggestions?
First convert www to a 2D Numpy array then do the following:
In [18]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[18]: array([1, 1, 0])
here we use broadcasting for check the equality of all www rows with uniques array and then using all() on last axis to find out which of its rows are completely equal to uniques rows.
Here's the elaborated results:
In [20]: (www[:,None] == uniques).all(2)
Out[20]:
array([[ True, False],
[ True, False],
[False, True]])
# Respective indices in `counts` array
In [21]: np.where((www[:,None] == uniques).all(2))[1]
Out[21]: array([0, 0, 1])
In [22]: counts[np.where((www[:,None] == uniques).all(2))[1]] > 1
Out[22]: array([ True, True, False])
In [23]: (counts[np.where((www[:,None] == uniques).all(2))[1]] > 1).astype(int)
Out[23]: array([1, 1, 0])
In Python, lists (and numpy arrays) cannot be hashed, so they can't be used as dictionary keys. But tuples can! So one option would be to convert your original list to a tuple, and to convert uniques to a tuple. The following works for me:
www = [np.array([1, 1, 1]), np.array([1, 1, 1]), np.array([2, 1, 1])]
www_tuples = [tuple(l) for l in www] # list of tuples
uniques, counts = np.unique(www, axis = 0, return_counts = True)
counts = [1 if x > 1 else 0 for x in counts]
# convert uniques to tuples
uniques_tuples = [tuple(l) for l in uniques]
count_dict = dict(zip(uniques_tuples, counts))
[count_dict[i] for i in www_tuples]
Just a heads-up: this will double your memory consumption, so it may not be the best solution if www is large.
You can mitigate the extra memory consumption by ingesting your data as tuples instead of numpy arrays if possible.
I am attempting to write a code that searches a numpy array for cases where the value in the fifth column does not have 50. If it does not I wish to remove it.
This is what I have so far:
for rows in range(len(b)):
if b[:,4].any() != 50:
b = np.delete(b, b[rows])
However, I keep getting the following error:
too many indices for array
Lets run the calculation with some diagnositic prints. Note where the error occurs. That's important! (We shouldn't just keep trying things without isolating the problem!)
In [2]: b=np.array([[0,1,2],[1,2,3],[2,1,2]])
In [3]: for row in range(len(b)):
...: print(row)
...: if b[:,2].any() !=2:
...: print(b[row])
...: b = np.delete(b, b[row])
...:
0
[0 1 2]
1
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-3-04dc188d9a2b> in <module>()
1 for row in range(len(b)):
2 print(row)
----> 3 if b[:,2].any() !=2:
4 print(b[row])
5 b = np.delete(b, b[row])
IndexError: too many indices for array
So the error occurs on the 2nd iteration (row 1). Something is wrong with the b after the delete. What is the new value of b?
In [4]: b
Out[4]: array([1, 2, 3, 2, 1, 2])
b is a 1d array, not the 2d we started with. That explains the error, right? Something must be wrong with the use of delete. Maybe we need to check its documentation????
Look at the axis parameter:
axis : int, optional
The axis along which to delete the subarray defined by `obj`.
If `axis` is None, `obj` is applied to the flattened array.
We didn't specify an axis, so the delete was applied to the flattened array, and result was flattened - 1d.
But even if I specify an axis I get an error (I won't get into that), which prompts me to look more carefully at the if condition:
In [10]: b[:,2]
Out[10]: array([2, 3, 2])
In [11]: b[:,2].any()
Out[11]: True
In [12]: b[:,2]!=2
Out[12]: array([False, True, False])
Applying any to the column don't make sense - it just checks if any values in the column are not 0. Instead we want to test the column against the target, getting a boolean that matches the column in size.
We can use that boolean directly as row selection mask
In [13]: b[_,:]
Out[13]: array([[1, 2, 3]])
No need to iterate.
Another problem with your iteration. You iterate on the range(3), [0,1,2]. But inside the loop you try to remove a row from b, changing the size of b. That going to give problems when you try to index b[row] by number, right? When iterating, in Python or numpy, be careful about modifying the object that you are iterating over.
Sorry to be long winded about this, but it looks like you need some basic debugging guidance.
Here's a basic list approach:
In [15]: [row for row in b if row[2]!=2]
Out[15]: [array([1, 2, 3])]
I'm iterating on the rows, not their indices, and for each row checking the column value, and keeping that row if the check is True. We could do that with np.delete, but a list comprehension is clearer (and faster).
It would be better to provide b and desired output, but if i understand it correctly, you could use:
import numpy as np
b = np.array([[50, 2, 3, 4, 5, 6],
[4, 50, 6, 7, 8, 9],
[1, 1, 1, 1, 50, 9]])
array([[50, 2, 3, 4, 5, 6],
[ 4, 50, 6, 7, 8, 9],
[ 1, 1, 1, 1, 50, 9]])
Then you can check which rows contain 50 in the 5th column using
b[:, 4] == 50
array([False, False, True])
and feed this Boolean array back to b to select the desired columns:
b[b[:, 4] == 50]
which leaves you with one row in this case
array([[ 1, 1, 1, 1, 50, 9]])
I have a signal where I want to find the average height of the values. This is done by finding the zero crossings and calculating the max and min between each zero crossing, then averaging these values.
My problem occurs when I want to use np.where() to find where the signal is crossing zero. When I use np.where() I get the result in a tuple, but I want it in an array where I can count the amount of times zero is crossed.
I am new to Python and coming from Matlab it is a bit confusing with all the different classes. As you can see, I get an error because nu = len(zero_u) gives 1 as a result, because the whole array is written in a tuple as one element.
Any ideas how to go around this?
The code looks like this:
import numpy as np
def averageheight(f):
rms = np.std(f)
f = f + (rms * 10**-6)
# Find zero crossing
fsign = np.sign(f)
fdiff = np.diff(fsign)
zero_u = np.asarray(np.where(fdiff > 0)) + 1
zero_d = np.asarray(np.where(fdiff < 0)) + 1
nu = len(zero_u)
nd = len(zero_d)
value_max = np.zeros((nu, 1))
value_min = np.zeros((nu, 1))
imaxvec = np.zeros((nu, 1))
iminvec = np.zeros((nu, 1))
if (nu > 2) and (nd > 2):
if zero_u[0] > zero_d[0]:
zero_d[0] = []
nu = len(zero_u)
nd = len(zero_d)
ncross = np.fmin(nu, nd)
# Find Maxima:
for ic in range(0, ncross - 1):
up = int(zero_u[ic])
down = int(zero_d[ic])
fvec = f[up:down]
value_max[ic] = np.amax(fvec)
index_max = value_max.argmax()
imaxvec[ic] = up + index_max - 1
# Find Minima:
for ic in range(0, ncross - 2):
down = int(zero_d[ic])
up = int(zero_u[ic+1])
fvec = f[down:up]
value_min[ic] = np.amin(fvec)
index_min = value_min.argmin()
iminvec[ic] = down + index_min - 1
# Remove spurious values, bumps and zero_d
thr = rms/3
maxfind = np.where(value_max < thr)
for i in range(0, len(maxfind)):
imaxfind = np.where(value_max == maxfind[i])
imaxvec[imaxfind] = 0
value_max[imaxfind] = 0
minfind = np.where(value_min > -thr)
for j in range(0, len(minfind)):
iminfind = np.where(value_min == minfind[j])
value_min[iminfind] = 0
iminvec[iminfind] = 0
# Find Average Height
avh = np.mean(value_max) - np.mean(value_min)
else:
avh = 0
return avh
np.where, and np.nonzero even more so, clearly explains that it returns a tuple, with one array for each dimension of the condition array:
In [71]: arr = np.random.randint(-5,5,10)
In [72]: arr
Out[72]: array([ 3, 4, 2, -3, -1, 0, -5, 4, 2, -3])
In [73]: arr.shape
Out[73]: (10,)
In [74]: np.where(arr>=0)
Out[74]: (array([0, 1, 2, 5, 7, 8]),)
In [75]: arr[_]
Out[75]: array([3, 4, 2, 0, 4, 2])
That Out[74] tuple can be used directly as an index.
You can also extract the array from the tuple:
In [76]: np.where(arr>=0)[0]
Out[76]: array([0, 1, 2, 5, 7, 8])
That, I think is a better choice than the np.asarray(np.where(...))
This convention for where becomes clearer when we use it on a 2d array
In [77]: arr2 = arr.reshape(2,5)
In [78]: np.where(arr2>=0)
Out[78]: (array([0, 0, 0, 1, 1, 1]), array([0, 1, 2, 0, 2, 3]))
In [79]: arr2[_]
Out[79]: array([3, 4, 2, 0, 4, 2])
Again we are indexing with a tuple. arr2[1,3] is really arr2[(1,3)]. The values in [] indexing brackets are actually passed to the indexing function as a tuple of values.
np.argwhere applies transpose to the result of where, producing an array:
In [80]: np.transpose(np.where(arr2>=0))
Out[80]:
array([[0, 0],
[0, 1],
[0, 2],
[1, 0],
[1, 2],
[1, 3]])
That's the same indexing arrays, but arranged in a 2d column matrix.
If you need the count of where without the actual values, a slightly faster function is
In [81]: np.count_nonzero(arr>=0)
Out[81]: 6
In fact np.nonzero uses the count to first determine the size of the arrays that it will return.