Add element after each element in numpy array python - python

I am just starting off with numpy and am trying to create a function that takes in an array (x), converts this into a np.array, and returns a numpy array with 0,0,0,0 added after each element.
It should look like so:
input array: [4,5,6]
output: [4,0,0,0,0,5,0,0,0,0,6,0,0,0,0]
I have tried the following:
import numpy as np
x = np.asarray([4,5,6])
y = np.array([])
for index, value in enumerate(x):
y = np.insert(x, index+1, [0,0,0,0])
print(y)
which returns:
[4 0 0 0 0 5 6]
[4 5 0 0 0 0 6]
[4 5 6 0 0 0 0]
So basically I need to combine the output into one single numpy array rather than three lists.
Would anybody know how to solve this?
Many thanks!

Use the numpy .zeros function !
import numpy as np
inputArray = [4,5,6]
newArray = np.zeros(5*len(inputArray),dtype=int)
newArray[::5] = inputArray
In fact, you 'force' all the values with indexes 0,5 and 10 to become 4,5 and 6.
so _____[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0]
becomes [4 0 0 0 0 5 0 0 0 0 6 0 0 0 0]
>>> newArray
array([4, 0, 0, 0, 0, 5, 0, 0, 0, 0, 6, 0, 0, 0 ,0])

I haven't used numpy to solve this problem,but this code seems to return your required output:
a = [4,5,6]
b = [0,0,0,0]
c = []
for x in a:
c = c + [x] + b
print(c)
I hope this helps!

Related

Numpy get secondary diagonal with offset=1 and change the values

I have this 6x6 matrix filled with 0s. I got the secondary diagonal in sec_diag. The thing I am trying to do is to change the values of above the sec_diag inside the matrix with the odds numbers from 9-1 [9,7,5,3,1]
import numpy as np
x = np.zeros((6,6), int)
sec_diag = np.diagonal(np.fliplr(x), offset=1)
The result should look like this:
[[0,0,0,0,9,0],
[0,0,0,7,0,0],
[0,0,5,0,0,0],
[0,3,0,0,0,0],
[1,0,0,0,0,0],
[0,0,0,0,0,0]]
EDIT: np.fill_diagonal isn't going to work.
You should use roll
x = np.zeros((6,6),dtype=np.int32)
np.fill_diagonal(np.fliplr(x), [9,7,5,3,1,0])
xr = np.roll(x,-1,axis=1)
print(xr)
Output
[[0 0 0 0 9 0]
[0 0 0 7 0 0]
[0 0 5 0 0 0]
[0 3 0 0 0 0]
[1 0 0 0 0 0]
[0 0 0 0 0 0]]
Maybe you should try with a double loop

Numpy / Pandas slicing based on intervals

Trying to figure out a way to slice non-contiguous and non-equal length rows of a pandas / numpy matrix so I can set the values to a common value. Has anyone come across an elegant solution for this?
import numpy as np
import pandas as pd
x = pd.DataFrame(np.arange(12).reshape(3,4))
#x is the matrix we want to index into
"""
x before:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
"""
y = pd.DataFrame([[0,3],[2,2],[1,2],[0,0]])
#y is a matrix where each row contains a start idx and end idx per column of x
"""
0 1
0 0 3
1 2 3
2 1 3
3 0 1
"""
What I'm looking for is a way to effectively select different length slices of x based on the rows of y
x[y] = 0
"""
x afterwards:
array([[ 0, 1, 2, 0],
[ 0, 5, 0, 7],
[ 0, 0, 0, 11]])
Masking can still be useful, because even if a loop cannot be entirely avoided, the main dataframe x would not need to be involved in the loop, so this should speed things up:
mask = np.zeros_like(x, dtype=bool)
for i in range(len(y)):
mask[y.iloc[i, 0]:(y.iloc[i, 1] + 1), i] = True
x[mask] = 0
x
0 1 2 3
0 0 1 2 0
1 0 5 0 7
2 0 0 0 11
As a further improvement, consider defining y as a NumPy array if possible.
I customized this answer to your problem:
y_t = y.values.transpose()
y_t[1,:] = y_t[1,:] - 1 # or remove this line and change '>= r' below to '> r`
r = np.arange(x.shape[0])
mask = ((y_t[0,:,None] <= r) & (y_t[1,:,None] >= r)).transpose()
res = x.where(~mask, 0)
res
# 0 1 2 3
# 0 0 1 2 0
# 1 0 5 0 7
# 2 0 0 0 11

Index of identical rows in a NumPy array

I already asked a variation of this question, but I still have a problem regarding the runtime of my code.
Given a numpy array consisting of 15000 rows and 44 columns. My goal is to find out which rows are equal and add them to a list, like this:
1 0 0 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
1 0 0 0 0
1 2 3 4 5
Result:
equal_rows1 = [1,2,3]
equal_rows2 = [0,4]
What I did up till now is using the following code:
import numpy as np
input_data = np.load('IN.npy')
equal_inputs1 = []
equal_inputs2 = []
for i in range(len(input_data)):
for j in range(i+1,len(input_data)):
if np.array_equal(input_data[i],input_data[j]):
equal_inputs1.append(i)
equal_inputs2.append(j)
The problem is that it takes a lot of time to return the desired arrays and that this allows only 2 different "similar row lists" although there can be more. Is there any better solution for this, especially regarding the runtime?
This is pretty simple with pandas groupby:
df
A B C D E
0 1 0 0 0 0
1 0 0 0 0 0
2 0 0 0 0 0
3 0 0 0 0 0
4 1 0 0 0 0
5 1 2 3 4 5
[g.index.tolist() for _, g in df.groupby(df.columns.tolist()) if len(g.index) > 1]
# [[1, 2, 3], [0, 4]]
If you are dealing with many rows and many unique groups, this might get a bit slow. The performance depends on your data. Perhaps there is a faster NumPy alternative, but this is certainly the easiest to understand.
You can use collections.defaultdict, which retains the row values as keys:
from collections import defaultdict
dd = defaultdict(list)
for idx, row in enumerate(df.values):
dd[tuple(row)].append(idx)
print(list(dd.values()))
# [[0, 4], [1, 2, 3], [5]]
print(dd)
# defaultdict(<class 'list'>, {(1, 0, 0, 0, 0): [0, 4],
# (0, 0, 0, 0, 0): [1, 2, 3],
# (1, 2, 3, 4, 5): [5]})
You can, if you wish, filter out unique rows via a dictionary comprehension.

How to find numpy array shape in a larger array?

big_array = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
print(big_array)
[[0 1 0 0 1 0 0 1]
[0 1 0 0 0 0 0 0]
[0 1 0 0 1 0 0 0]
[0 0 0 0 1 0 0 0]
[1 0 0 0 1 0 0 0]]
Is there a way to iterate over this numpy array and for each 2x2 cluster of 0s, set all values within that cluster = 5? This is what the output would look like.
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 0]
[0 1 5 5 1 5 5 0]
[0 0 5 5 1 5 5 0]
[1 0 5 5 1 5 5 0]]
My thoughts are to use advanced indexing to set the 2x2 shape = to 5, but I think it would be really slow to simply iterate like:
1) check if array[x][y] is 0
2) check if adjacent array elements are 0
3) if all elements are 0, set all those values to 5.
big_array = [1, 7, 0, 0, 3]
i = 0
p = 0
while i <= len(big_array) - 1 and p <= len(big_array) - 2:
if big_array[i] == big_array[p + 1]:
big_array[i] = 5
big_array[p + 1] = 5
print(big_array)
i = i + 1
p = p + 1
Output:
[1, 7, 5, 5, 3]
It is a example, not whole correct code.
Here's a solution by viewing the array as blocks.
First you need to define this function rolling_window from here https://gist.github.com/seberg/3866040/revisions
Then break the array big, your starting array, into 2x2 blocks using this function.
Also generate an array which has indices of every element in big and break it similarly into 2x2 blocks.
Then generate a boolean mask where the 2x2 blocks of big are all zero, and use the index array to get those elements.
blks = rolling_window(big,window=(2,2)) # 2x2 blocks of original array
inds = np.indices(big.shape).transpose(1,2,0) # array of indices into big
blkinds = rolling_window(inds,window=(2,2,0)).transpose(0,1,4,3,2) # 2x2 blocks of indices into big
mask = blks == np.zeros((2,2)) # generate a mask of every 2x2 block which is all zero
mask = mask.reshape(*mask.shape[:-2],-1).all(-1) # still generating the mask
# now blks[mask] is every block which is zero..
# but you actually want the original indices in the array 'big' instead
inds = blkinds[mask].reshape(-1,2).T # indices into big where elements need replacing
big[inds[0],inds[1]] = 5 #reassign
You need to test this: I did not. But the idea is to break the array into blocks, and an array of indices into blocks, then develop a boolean condition on the blocks, use those to get the indices, and then reassign.
An alternative would be to iterate through indblks as defined here, then test the 2x2 obtained from big at each indblk element and reassign if necessary.
This is my attempt to help you solve your problem. My solution may be subject to fair criticism.
import numpy as np
from itertools import product
m = np.array((
[0,1,0,0,1,0,0,1],
[0,1,0,0,0,0,0,0],
[0,1,0,0,1,0,0,0],
[0,0,0,0,1,0,0,0],
[1,0,0,0,1,0,0,0]))
h = 2
w = 2
rr, cc = tuple(d + 1 - q for d, q in zip(m.shape, (h, w)))
slices = [(slice(r, r + h), slice(c, c + w))
for r, c in product(range(rr), range(cc))
if not m[r:r + h, c:c + w].any()]
for s in slices:
m[s] = 5
print(m)
[[0 1 5 5 1 5 5 1]
[0 1 5 5 0 5 5 5]
[0 1 5 5 1 5 5 5]
[0 5 5 5 1 5 5 5]
[1 5 5 5 1 5 5 5]]

How to convert a column or row matrix to a diagonal matrix in Python?

I have a row vector A, A = [a1 a2 a3 ..... an] and I would like to create a diagonal matrix, B = diag(a1, a2, a3, ....., an) with the elements of this row vector. How can this be done in Python?
UPDATE
This is the code to illustrate the problem:
import numpy as np
a = np.matrix([1,2,3,4])
d = np.diag(a)
print (d)
the output of this code is [1], but my desired output is:
[[1 0 0 0]
[0 2 0 0]
[0 0 3 0]
[0 0 0 4]]
You can use diag method:
import numpy as np
a = np.array([1,2,3,4])
d = np.diag(a)
# or simpler: d = np.diag([1,2,3,4])
print(d)
Results in:
[[1 0 0 0]
[0 2 0 0]
[0 0 3 0]
[0 0 0 4]]
If you have a row vector, you can do this:
a = np.array([[1, 2, 3, 4]])
d = np.diag(a[0])
Results in:
[[1 0 0 0]
[0 2 0 0]
[0 0 3 0]
[0 0 0 4]]
For the given matrix in the question:
import numpy as np
a = np.matrix([1,2,3,4])
d = np.diag(a.A1)
print (d)
Result is again:
[[1 0 0 0]
[0 2 0 0]
[0 0 3 0]
[0 0 0 4]]
I suppose you could also use diagflat:
import numpy
a = np.matrix([1,2,3,4])
d = np.diagflat(a)
print (d)
Which like the diag method results in
[[1 0 0 0]
[0 2 0 0]
[0 0 3 0]
[0 0 0 4]]
but there's no need for flattening with .A1
Another solution could be:
import numpy as np
a = np.array([1,2,3,4])
d = a * np.identity(len(a))
As for performances for the various answers here, I get with timeit on 100000 repetitions:
np.array and np.diag (Marcin's answer): 2.18E-02 s
np.array and np.identity (this answer): 6.12E-01 s
np.matrix and np.diagflat (Bokee's answer): 1.00E-00 s
Assuming you are working in numpy based on your tags, this will do it:
import numpy
def make_diag( A ):
my_diag = numpy.zeroes( ( 2, 2 ) )
for i, a in enumerate( A ):
my_diag[i,i] = a
return my_diag
enumerate( LIST ) creates an iterator over the list that returns tuples like:
( 0, 1st element),
( 1, 2nd element),
...
( N-1, Nth element )

Categories