How to insert rows multiple times to a numpy array?

How to insert rows multiple times to a numpy array? - python

I have an array that I need to check if it has missing values in its rows. The second column must follow a sequence and if a missing value is found I need to insert it.
[[123 1 0
123 2 0
123 4 0
123 5 0
123 8 0
123 9 0
...]]
In this example I'd need to insert at row 2 the values [123 3 0] and at row 4 [[123 6 0], [123 7 0]].
I am iterating the array row by row checking if there is a missing row, using numpy.insert to do it, but it returns a copy every time an insert is done, increasing the index at which the rows should be inserted every time this operation is done.
Is this a reasonable way to do it?

Look at this way without using insert:
import numpy as np
x = np.array([[123, 1, 0],
[123, 2, 0],
[123, 4, 0],
[123, 5, 0],
[123, 8, 0],
[123, 9, 0]])
y = np.zeros((x[-1, 1], x.shape[1]))
y[x[:,1] - 1] = x
indexes = np.where((y[:,0] == 0) & (y[:,1] == 0) & (y[:,2] == 0))[0]
y[indexes] = [[123, i + 1, 0] for i in indexes]
So now,
print(y)
[[123., 1., 0.]
[123., 2., 0.]
[123., 3., 0.]
[123., 4., 0.]
[123., 5., 0.]
[123., 6., 0.]
[123., 7., 0.]
[123., 8., 0.]
[123., 9., 0.]]
Hope this can help you :)

Related

Get output after matrix operation

I have a matrix A:
[[ 1 2]
[ 3 4]
[ 5 6]
[ 7 8]
[ 9 10]]
And I have matrix B:
[[1 0 0]
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]]
And my desired Output is :
Matrix C:
[[1 0 0]
[0 3 0]
[5 0 0]
[0 0 7]
[0 9 0]]
i.e I would like to get first Column of Matrix A, and Substitute its values in Matrix B, where it says "1". Problem is that I need to do it using Matrix operations in Numpy, i.e without using Loops.
So far, I have done following. Please help me do it in easy steps
mat_A = np.array([[1,2],[3,4],[5,6],[7,8],[9,10]])
mat_B = np.array([[1,0,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0]])
mat_A1 = np.zeros(mat_B.shape)
mat_A1[:mat_A.shape[0],:mat_A.shape[1]] = mat_A
mat_A1[:,1] = np.zeros(5)
print(mat_A1)
mat_A2 = np.zeros(mat_c.shape)
mat_A2[:mat_A.shape[0],:mat_A.shape[1]] = mat_A
mat_A2[:,0] = np.zeros(5)
print(mat_A2)
print(mat_B)
My Output is :
[[1. 0. 0.]
[3. 0. 0.]
[5. 0. 0.]
[7. 0. 0.]
[9. 0. 0.]]
[[ 0. 2. 0.]
[ 0. 4. 0.]
[ 0. 6. 0.]
[ 0. 8. 0.]
[ 0. 10. 0.]]
[[1 0 0]
[0 1 0]
[1 0 0]
[0 0 1]
[0 1 0]]
If I multiply, I get different output. Please help me get Matrix C.
I want to do it WITHOUT USING LOOP and only using numpy and matrix operations.

Here's a solution without the use of for loops:
import numpy as np
mat_A = np.array([[1,2],[3,4],[5,6],[7,8],[9,10]])
mat_B = np.array([[1,0,0],[0,1,0],[1,0,0],[0,0,1],[0,1,0]])
mat_C = mat_B.copy()
mask = (mat_C[...] == 1) #Create a mask
mat_C[mask] = mat_A[...,0] #Replace masked values by the ones in mat_A's first column
print(mat_C)
Create a mask and use it to index into mat_C to assign the values of the first column of mat_A to the 1's that were in mat_B.

You could do this..
C = np.zeros((B.shape))
for i in range(A.shape[0]):
C[i,:]=B[i,:]*A[i,0]
result:
array([[1., 0., 0.],
[0., 3., 0.],
[5., 0., 0.],
[0., 0., 7.],
[0., 9., 0.]])
you could also do this which is a bit more generalized if the data you are providing is just an example of data you are really working on...
replace_val = 1
for i in range(B.shape[0]):
for j in range(B.shape[1]):
if B[i,j] == replace_val:
C[i,j] = A[i,0]
same result
EDIT : this way works with no loops
vals_to_change = np.where(B==1)
C[vals_to_change] = A[vals_to_change[0],0]*B[vals_to_change]
same result

filter tensorflow array with specific condition over numpy array

I have a tensorflow array names tf-array and a numpy array names np_array. I want to find specific rows in tf_array with regards to np-array.
tf-array = tf.constant(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
I also have an np-array:
np_array = np.matrix(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]]
Now I want to keep the elements in tf-array in which the combination of n (here n is 2) of them (index of them) is in the value of np-array. What does it mean?
For example, in tf-array, in the first column, indexes which has value are: (0,3,4). Is there any row in np-array which contains any combination of these two indexes: (0,3), (0,4) or (3,4). Actually, there is no such row. So all the elements in that column became zero.
Indexes for the second column in tf-array is (0,1) (0,5) (1,5). As you see the record (1,5) is available in the np-array in the first row. Thats why we keep those in the tf-array.
So the final result should be like this:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]
I am looking for a very efficient approach as I have large number of data.
Update1
I could get this with the below code which is giving True where there is value and the zero mask to false:
[[ True True False False]
[False True False True]
[False False True True]
[ True False True False]
[ True False True False]
[False True True False]
[False False True True]]
with tf.Session() as sess:
where = tf.not_equal(tf-array, 0.0)
print(sess.run(where))
But how can I compare theese matrix with np_array?
Thank you in advance!

Here is the solution from https://stackoverflow.com/a/56510832/7207392 with necessary modifications. For the sake of simplicity I use np.array for all data. I'm no tensortflow expert, so if translating is not entirely straight forward, you'll have to ask somebody else how to do it.
import numpy as np
def f(a1, a2, n):
N,M = a1.shape
a1p = np.concatenate([a1,np.zeros((1,a1.shape[1]),a1.dtype)], axis=0)
a2 = np.sort(a2, axis=1)
a2[:,1:][a2[:,1:]==a2[:,:-1]] = N
y,x = np.where(np.count_nonzero(a1p[a2], axis=1) >= n)
out = np.zeros_like(a1p)
out[a2[y],x[:,None]] = a1p[a2[y],x[:,None]]
return out[:-1]
a1 = np.array(
[[9.968594, 8.655439, 0., 0. ],
[0., 8.3356, 0., 8.8974 ],
[0., 0., 6.103182, 7.330564 ],
[6.609862, 0., 3.0614321, 0. ],
[9.497023, 0., 3.8914037, 0. ],
[0., 8.457685, 8.602337, 0. ],
[0., 0., 5.826657, 8.283971 ]])
a2 = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
print(f(a1,a2,2))
Output:
[[0. 0. 0. 0. ]
[0. 8.3356 0. 8.8974 ]
[0. 0. 6.103182 7.330564 ]
[0. 0. 3.0614321 0. ]
[0. 0. 3.8914037 0. ]
[0. 8.457685 8.602337 0. ]
[0. 0. 5.826657 8.283971 ]]

The one eficient way you can try is to make bit flags for each row what value are there like for (0,3,4) will be 1 <<0 | 1<<3 | 1<<4. You will have array of values with flags.Try if << and | operator work in numpy.
Make the same for another array, i guess tf- arrays are just wrapped numpys.
After having 2 array of flags, make bitwise "and" over those. Where you condition is true for rows, the result will have at least two non zero bits. Also cound of bits can be done also efficient, google for that.
This hovever wont work with float - you ll need convert those to pretty small ints.
import numpy as np
arr_one = np.array(
[[2, 5, 1],
[1, 6, 4],
[0, 0, 0],
[2, 3, 6],
[4, 2, 4]])
arr_two = np.array(
[[2, 0, 7],
[1, 3, 4],
[5, 5, 6],
[1, 3, 6],
[4, 2, 4]])
print('1 << arr_one.T[0] ' , 1 << arr_one.T[0] )
arr_one_flags = 1 << arr_one.T[0] | 1 << arr_one.T[1] | 1 << arr_one.T[2]
print('arr_one_flags ', arr_one_flags)
arr_two_flags = 1 << arr_two.T[0] | 1 << arr_two.T[1] | 1 << arr_two.T[2]
arr_and = arr_one_flags & arr_two_flags
print('arr_and ', arr_and)
def get_bit_count(value):
n = 0
while value:
n += 1
value &= value-1
return n
arr_matches = np.array([get_bit_count(x) for x in arr_and])
print('arr_matches ', arr_matches )
arr_two_filtered = arr_two[arr_matches > 1]
print('arr_two_filtered ', arr_two_filtered )

Where clause with numpy with single array and / or empty_like

I am trying to figure out how the np.where clause works. I create a simple df:
np.random.seed(1)
df = pd.DataFrame(np.random.randint(0, 10, size=(3, 4)), columns=list('ABCD'))
print(df)
A B C D
0 5 8 9 5
1 0 0 1 7
2 6 9 2 4
Now when I implement:
print(np.where(df.values, 1, np.nan))
I receive:
[[ 1. 1. 1. 1.]
[ nan nan 1. 1.]
[ 1. 1. 1. 1.]]
But when I create an empty_like array from df: and put it into where clause I receive this:
print(np.where(np.empty_like(df.values), 1, np.nan))
[[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]
[ 1. 1. 1. 1.]]
Really could use help on explaining how where clause works on a single array.

np.empty_like()
Docs:-
numpy.empty_like(prototype, dtype=None, order='K', subok=True)
Return a new array with the same shape and type as a given array.
>>> a = ([1,2,3], [4,5,6]) # a is array-like
>>> np.empty_like(a)
array([[-1073741821, -1073741821, 3], #random
[ 0, 0, -1073741821]])
np.empty_like() creates an array of the same shape and type as the given array but with random numbers. This array now goes into np.where()
numpy.where()
Docs:-
numpy.where(condition[, x, y])
Return elements that are chosen from x or y depending on condition.
Example:-
>>> a = np.arange(10)
>>> a
array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
>>> np.where(a < 5, a, 10*a)
array([ 0, 1, 2, 3, 4, 50, 60, 70, 80, 90])
>>>np.where(a,1,np.nan)
array([nan, 1., 1., 1., 1., 1., 1., 1., 1., 1.])
In Python any number other than zero is considered to be TRUE whereas zero is considered to FALSE.
When np.where() gets a np.array it checks for the condition, Here the array itself acts as condition i.e, the np.where evaluates to TRUE when the array elements are not zero and FALSE when they are 0. So the "True" elements are replaced by 1 and "False" elements by np.nan.
Reference:-
numpy.where()
numpy.empty_like()

How to split an array based on minimum row value using vectorization

I am trying to figure out how to take the following for loop that splits an array based on the index of the lowest value in the row and use vectorization. I've looked at this link and have been trying to use the numpy.where function but currently unsuccessful.
For example if an array has n columns, then all the rows where col[0] has the lowest value are put in one array, all the rows where col[1] are put in another, etc.
Here's the code using a for loop.
import numpy
a = numpy.array([[ 0. 1. 3.]
[ 0. 1. 3.]
[ 0. 1. 3.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 1. 0. 2.]
[ 3. 1. 0.]
[ 3. 1. 0.]
[ 3. 1. 0.]])
result_0 = []
result_1 = []
result_2 = []
for value in a:
if value[0] <= value[1] and value[0] <= value[2]:
result_0.append(value)
elif value[1] <= value[0] and value[1] <= value[2]:
result_1.append(value)
else:
result_2.append(value)
print(result_0)
>>[array([ 0. 1. 3.]), array([ 0. 1. 3.]), array([ 0. 1. 3.])]
print(result_1)
>>[array([ 1. 0. 2.]), array([ 1. 0. 2.]), array([ 1. 0. 2.])]
print(result_2)
>>[array([ 3. 1. 0.]), array([ 3. 1. 0.]), array([ 3. 1. 0.])]

First, use argsort to see where the lowest value in each row is:
>>> a.argsort(axis=1)
array([[0, 1, 2],
[0, 1, 2],
[0, 1, 2],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[2, 1, 0],
[2, 1, 0],
[2, 1, 0]])
Note that wherever a row has 0, that is the smallest column in that row.
Now you can build the results:
>>> sortidx = a.argsort(axis=1)
>>> [a[sortidx[:,i] == 0] for i in range(a.shape[1])]
[array([[ 0., 1., 3.],
[ 0., 1., 3.],
[ 0., 1., 3.]]),
array([[ 1., 0., 2.],
[ 1., 0., 2.],
[ 1., 0., 2.]]),
array([[ 3., 1., 0.],
[ 3., 1., 0.],
[ 3., 1., 0.]])]
So it is done with only a single loop over the columns, which will give a huge speedup if the number of rows is much larger than the number of columns.

This is not the best solution since it relies on simple python loops and is not very efficient when you start dealing with large data sets but it should get you started.
The point is to create an array of "buckets" which store the data based on the depth of the lengthiest element. Then enumerate each element in values, selecting the smallest one and saving its offset which is subsequently appended to the correct results "bucket", for each a. Finally we print this out in the last loop.
Solution using loops:
import numpy
import pprint
# random data set
a = numpy.array([[0, 1, 3],
[0, 1, 3],
[0, 1, 3],
[1, 0, 2],
[1, 0, 2],
[1, 0, 2],
[3, 1, 0],
[3, 1, 0],
[3, 1, 0]])
# create a list of results as big as the depth of elements in an entry
results = list()
for l in range(max(len(i) for i in a)):
results.append(list())
# don't do the following because all the references to the lists will be the same and you get dups:
# results = [[]]*max(len(i) for i in a)
for value in a:
res_offset, _val = min(enumerate(value), key=lambda x: x[1]) # get the offset and min value
results[res_offset].append(value) # store the original Array obj in the correct "bucket"
# print for visualization
for c, r in enumerate(results):
print("result_%s: %s" % (c, r))
Outputs:
result_0: [array([0, 1, 3]), array([0, 1, 3]), array([0, 1, 3])]
result_1: [array([1, 0, 2]), array([1, 0, 2]), array([1, 0, 2])]
result_2: [array([3, 1, 0]), array([3, 1, 0]), array([3, 1, 0])]

I found a much easier way to do this. I hope that I am interpreting the OP correctly.
My sense is that the OP wants to create a slice of the larger array based upon some set of conditions.
Note that the code above to create the array does not seem to work--at least in python 3.5. I generated the array as follow.
a = np.array([0., 1., 3., 0., 1., 3., 0., 1., 3., 1., 0., 2., 1., 0., 2.,1., 0., 2.,3., 1., 0.,3., 1., 0.,3., 1., 0.]).reshape([9,3])
Next, I sliced the original array into smaller arrays. Numpy has builtins to help with this.
result_0 = a[np.logical_and(a[:,0] <= a[:,1],a[:,0] <= a[:,2])]
result_1 = a[np.logical_and(a[:,1] <= a[:,0],a[:,1] <= a[:,2])]
result_2 = a[np.logical_and(a[:,2] <= a[:,0],a[:,2] <= a[:,1])]
This will generate new numpy arrays that match the given conditions.
Note if the user wants to convert these individual rows into a list or arrays, he/she can just enter the following code to obtain the result.
result_0 = [np.array(x) for x in result_0.tolist()]
result_0 = [np.array(x) for x in result_1.tolist()]
result_0 = [np.array(x) for x in result_2.tolist()]
This should generate the outcome requested in the OP.

Python: Resize an existing array and fill with zeros

I think that my issue should be really simple, yet I can not find any help
on the Internet whatsoever. I am very new to Python, so it is possible that
I am missing something very obvious.
I have an array, S, like this [x x x] (one-dimensional). I now create a
diagonal matrix, sigma, with np.diag(S) - so far, so good. Now, I want to
resize this new diagonal array so that I can multiply it by another array that
I have.
import numpy as np
...
shape = np.shape((6, 6)) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
my_sigma = sigma.resize(shape) #Resize the matrix and fill with zeros - returns "None" - why?
However, when I print the contents of my_sigma, I get "None". Can someone please
point me in the right direction, because I can not imagine that this should be
so complicated.
Thanks in advance for any help!
Casper
Graphical:
I have this:
[x x x]
I want this:
[x 0 0]
[0 x 0]
[0 0 x]
[0 0 0]
[0 0 0]
[0 0 0] - or some similar size, but the diagonal elements are important.

There is a new numpy function in version 1.7.0 numpy.pad that can do this in one-line. Like the other answers, you can construct the diagonal matrix with np.diag before the padding.
The tuple ((0,N),(0,0)) used in this answer indicates the "side" of the matrix which to pad.
import numpy as np
A = np.array([1, 2, 3])
N = A.size
B = np.pad(np.diag(A), ((0,N),(0,0)), mode='constant')
B is now equal to:
[[1 0 0]
[0 2 0]
[0 0 3]
[0 0 0]
[0 0 0]
[0 0 0]]

sigma.resize() returns None because it operates in-place. np.resize(sigma, shape), on the other hand, returns the result but instead of padding with zeros, it pads with repeats of the array.
Also, the shape() function returns the shape of the input. If you just want to predefine a shape, just use a tuple.
import numpy as np
...
shape = (6, 6) #This will be some pre-determined size
sigma = np.diag(S) #diagonalise the matrix - this works
sigma.resize(shape) #Resize the matrix and fill with zeros
However, this will first flatten out your original array, and then reconstruct it into the given shape, destroying the original ordering. If you just want to "pad" with zeros, instead of using resize() you can just directly index into a generated zero-matrix.
# This assumes that you have a 2-dimensional array
zeros = np.zeros(shape, dtype=np.int32)
zeros[:sigma.shape[0], :sigma.shape[1]] = sigma

I see the edit... you do have to create the zeros first and then move some numbers into it. np.diag_indices_from might be useful for you
bigger_sigma = np.zeros(shape, dtype=sigma.dtype)
diag_ij = np.diag_indices_from(sigma)
bigger_sigma[diag_ij] = sigma[diag_ij]

This solution works with resize function
Take a sample array
S= np.ones((3))
print (S)
# [ 1. 1. 1.]
d= np.diag(S)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]]
"""
This dosent work, it just add a repeating values
np.resize(d,(6,3))
"""
adds a repeating value
array([[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.],
[ 1., 0., 0.],
[ 0., 1., 0.],
[ 0., 0., 1.]])
"""
This does work
d.resize((6,3),refcheck=False)
print(d)
"""
[[ 1. 0. 0.]
[ 0. 1. 0.]
[ 0. 0. 1.]
[ 0. 0. 0.]
[ 0. 0. 0.]
[ 0. 0. 0.]]
"""

Another pure python solution is
a = [1, 2, 3]
b = []
for i in range(6):
b.append((([0] * i) + a[i:i+1] + ([0] * (len(a) - 1 - i)))[:len(a)])
b is now
[[1, 0, 0], [0, 2, 0], [0, 0, 3], [0, 0, 0], [0, 0, 0], [0, 0, 0]]
it's a hideous solution, I'll admit that.
However, it illustrates some functions of the list type that can be used.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to insert rows multiple times to a numpy array? - python

Related

Get output after matrix operation

filter tensorflow array with specific condition over numpy array

Where clause with numpy with single array and / or empty_like

How to split an array based on minimum row value using vectorization

Python: Resize an existing array and fill with zeros

Categories

Resources