I want to create an array with a format, and the values originate from another array. My input array consists out of three columns. I want to create an array with in the first row all values from the third column if the second column is equal. So in this example the first three values in the second column are equal, so in the new array i want the third value of each row in the new array.
a =
[[1, 1, 4],
[2, 1, 6],
[3, 1, 7],
[4, 2, 0],
[5, 2, 7],
[6, 3, 1]]
result:
b =
[[4, 6 , 7],
[0, 7],
[1]]
I tried:
c = []
x = 1
for row in a:
if row[0] == x
c.extend[row[2]]
else:
x = x + 1
c.append(row[2])
But the result is a list of all 3rd values
a = np.asarray(a)
c = []
for i in range(a[-1,1]): #a[-1,1] is the maximum that will occur
save = a[a[:,1]==i] # take all the ones that have i in the second entry
c.append(save[:,2]) # of those add the last entry
It's important, that ais converted to a np.array for this.
If the second column is sorted, you can use np.diff to find out the index where the value changes and then split on it:
np.split(a[:,2], np.flatnonzero(np.diff(a[:,1]) != 0)+1)
# [array([4, 6, 7]), array([0, 7]), array([1])]
The below works for me:
import numpy as np
c = [[]]
x = 1
for row in a:
if row[1] == x:
c[-1].append(row[2])
else:
x = x + 1
c.append([row[2]])
c = np.asarray(c)
Related
I want to clean my data reducing the number of duplicates. I do not want to delete ALL duplicates.
How can I get a numpy array with certain number of duplicates?
Suppose, I have
x = np.array([[1,2,3],[1,2,3],[5,5,5],[1,2,3],[1,2,3]])
and I set number of duplicates as 2.
And the output should be like
x
>>[[1,2,3],[1,2,3],[5,5,5]]
or
x
>>[[5,5,5],[1,2,3],[1,2,3]]
It does not meter in my task
Even though using list appending as an intermediate step is not always a good idea when you already have numpy arrays, in this case it is by far the cleanest way to do it:
def n_uniques(arr, max_uniques):
uniq, cnts = np.unique(arr, axis=0, return_counts=True)
arr_list = []
for i in range(cnts.size):
num = cnts[i] if cnts[i] <= max_uniques else max_uniques
arr_list.extend([uniq[i]] * num)
return np.array(arr_list)
x = np.array([[1,2,3],
[1,2,3],
[1,2,3],
[5,5,5],
[1,2,3],
[1,2,3],])
reduced_arr = n_uniques(x, 2)
This was kind of tricky, but you can actually do that without loops and preserving the relative order in the original array with something like this (in this case the first repetitions are preserved):
import numpy as np
def drop_extra_repetitions(x, max_reps):
# Find unique rows
uniq, idx_inv, counts = np.unique(x, axis=0, return_inverse=True, return_counts=True)
# Compute number of repetitions of each different row
counts_clip = np.minimum(counts, max_reps)
# Array alternating between valid unique row indices and -1 ([0, -1, 1, -1, ...])
idx_to_repeat = np.stack(
[np.arange(len(uniq)), -np.ones(len(uniq), dtype=int)], axis=1).ravel()
# Number of repetitions for each of the previous indices
idx_repeats_clip = np.stack([counts_clip, counts - counts_clip], axis=1).ravel()
# Valid unique row indices are repetead at most max_reps,
# extra repetitions are filled with -1
idx_clip_sorted = np.repeat(idx_to_repeat, idx_repeats_clip)
# Sorter for inverse index - that is, sort the indices in the input array
# according to their corresponding unique row index
sorter = np.argsort(idx_inv)
# The final inverse index is the same as the original but with -1 on extra repetitions
idx_inv_final = np.empty(len(sorter), dtype=int)
idx_inv_final[sorter] = idx_clip_sorted
# Return the array reconstructed from the inverse index without the positions with -1
return uniq[idx_inv_final[idx_inv_final >= 0]]
x = [[5, 5, 5], [1, 2, 3], [1, 2, 3], [5, 5, 5], [1, 2, 3], [1, 2, 3]]
max_reps = 2
print(drop_extra_repetitions(x, max_reps))
# [[5 5 5]
# [1 2 3]
# [1 2 3]
# [5 5 5]]
If you do not need to preserve the order at all, then you can simply do:
import numpy as np
def drop_extra_repetitions(x, max_reps):
uniq, counts = np.unique(x, axis=0, return_counts=True)
# Repeat each unique row index at most max_reps
ret_idx = np.repeat(np.arange(len(uniq)), np.minimum(counts, max_reps))
return uniq[ret_idx]
x = [[5, 5, 5], [1, 2, 3], [1, 2, 3], [5, 5, 5], [1, 2, 3], [1, 2, 3]]
max_reps = 2
print(drop_extra_repetitions(x, max_reps))
# [[1 2 3]
# [1 2 3]
# [5 5 5]
# [5 5 5]]
I'm trying to delete the last few rows from a numpy array. I'm able to delete 0 to i rows with the following code.
for i, line in enumerate(two_d_array1):
if all(v == 0 for v in line):
pass
else:
break
two_d_array2 = np.delete(two_d_array1, slice(0, i), axis=0)
Any suggestions on how to do the same for the end of the array?
for i, line in enumerate(reversed(two_d_array2)):
if all(v == 0 for v in line):
pass
else:
break
two_d_array3 = np.delete(two_d_array2, **slice(0, i)**, axis=0)
You can use slice notation for your indexing.
To remove the last n rows from an array:
a = np.array(range(10)).reshape(5, 2)
>>> a
array([[0, 1],
[2, 3],
[4, 5],
[6, 7],
[8, 9]])
n = 2 # Remove last two rows of array.
>>> a[:-n, :]
array([[0, 1],
[2, 3],
[4, 5]])
To remove the first n rows from an array:
>>> a[n:, :] # Remove first two rows.
array([[4, 5],
[6, 7],
[8, 9]])
you can also use :
array_name[:-n]
This is the efficient approach with best time complexity than the previous one
I have the following problem:
Let's say I have an array defined like this:
A = np.array([[1,2,3],[4,5,6],[7,8,9]])
What I would like to do is to make use of Numpy multiple indexing and set several elements to 0. To do that I'm creating a vector:
indices_to_remove = [1, 2, 0]
What I want it to mean is the following:
Remove element with index '1' from the first row
Remove element with index '2' from the second row
Remove element with index '0' from the third row
The result should be the array [[1,0,3],[4,5,0],[0,8,9]]
I've managed to get values of the elements I would like to modify by following code:
values = np.diagonal(np.take(A, indices, axis=1))
However, that doesn't allow me to modify them. How could this be solved?
You could use integer array indexing to assign those zeros -
A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
Sample run -
In [445]: A
Out[445]:
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
In [446]: indices_to_remove
Out[446]: [1, 2, 0]
In [447]: A[np.arange(len(indices_to_remove)), indices_to_remove] = 0
In [448]: A
Out[448]:
array([[1, 0, 3],
[4, 5, 0],
[0, 8, 9]])
Suppose I have an array like this:
a = np.array([[2,1],
[4,2],
[1,3],...]
I want to retrieve the elements of the second column where the corresponding elements in the first column match some condition. So something like
a[a[:,0] == np.array([2,4]),1] (?)
should give
np.array([1,2])
While this uses list to collect results and requires a for loop, this collects the second column values once the first column has passed some criteria (in a list of acceptable results in this case).
a = np.array([[2, 1],
[4, 2],
[1, 3]])
b = []
criteria = [2, 4]
for entry in a:
if entry[0] in criteria:
b.append(entry[1])
b = np.array(b)
You could create a mask based off of your first column, and then use that to mask off the second column.
import numpy as np
a = np.array([[2, 1],
[4, 2],
[1, 3]])
mask = np.logical_or(a[:,0] == 2, a[:,0] == 4)
b = a[:,1][mask]
print(b)
Returns:
[1, 2]
It could get a little clumsy if you have many values you want to compare to.
More specifically, I have a list of rows/columns that need to be ignored when choosing the max entry. In other words, when choosing the max upper triangular entry, certain indices need to skipped. In that case, what is the most efficient way to find the location of the max upper triangular entry?
For example:
>>> a
array([[0, 1, 1, 1],
[1, 2, 3, 4],
[4, 5, 6, 6],
[4, 5, 6, 7]])
>>> indices_to_skip = [0,1,2]
I need to find the index of the min element among all elements in the upper triangle except for the entries a[0,1], a[0,2], and a[1,2].
You can use np.triu_indices_from:
>>> np.vstack(np.triu_indices_from(a,k=1)).T
array([[0, 1],
[0, 2],
[0, 3],
[1, 2],
[1, 3],
[2, 3]])
>>> inds=inds[inds[:,1]>2] #Or whatever columns you want to start from.
>>> inds
array([[0, 3],
[1, 3],
[2, 3]])
>>> a[inds[:,0],inds[:,1]]
array([1, 4, 6])
>>> max_index = np.argmax(a[inds[:,0],inds[:,1]])
>>> inds[max_index]
array([2, 3]])
Or:
>>> inds=np.triu_indices_from(a,k=1)
>>> mask = (inds[1]>2) #Again change 2 for whatever columns you want to start at.
>>> a[inds][mask]
array([1, 4, 6])
>>> max_index = np.argmax(a[inds][mask])
>>> inds[mask][max_index]
array([2, 3]])
For the above you can use inds[0] to skip certains rows.
To skip specific rows or columns:
def ignore_upper(arr, k=0, skip_rows=None, skip_cols=None):
rows, cols = np.triu_indices_from(arr, k=k)
if skip_rows != None:
row_mask = ~np.in1d(rows, skip_rows)
rows = rows[row_mask]
cols = cols[row_mask]
if skip_cols != None:
col_mask = ~np.in1d(cols, skip_cols)
rows = rows[col_mask]
cols = cols[col_mask]
inds=np.ravel_multi_index((rows,cols),arr.shape)
return np.take(arr,inds)
print ignore_upper(a, skip_rows=1, skip_cols=2) #Will also take numpy arrays for skipping.
[0 1 1 6 7]
The two can be combined and creative use of boolean indexing can help speed up specific cases.
Something interesting that I ran across, a faster way to take upper triu indices:
def fast_triu_indices(dim,k=0):
tmp_range = np.arange(dim-k)
rows = np.repeat(tmp_range,(tmp_range+1)[::-1])
cols = np.ones(rows.shape[0],dtype=np.int)
inds = np.cumsum(tmp_range[1:][::-1]+1)
np.put(cols,inds,np.arange(dim*-1+2+k,1))
cols[0] = k
np.cumsum(cols,out=cols)
return (rows,cols)
Its about ~6x faster although it does not work for k<0:
dim=5000
a=np.random.rand(dim,dim)
k=50
t=time.time()
rows,cols=np.triu_indices(dim,k=k)
print time.time()-t
0.913508892059
t=time.time()
rows2,cols2,=fast_triu_indices(dim,k=k)
print time.time()-t
0.16515994072
print np.allclose(rows,rows2)
True
print np.allclose(cols,cols2)
True