Delete and duplicate rows in numpy array

Delete and duplicate rows in numpy array - python

In Python, let's say I have a 1366x768 numpy array. And I want to delete each second row from it (0th row remains, 1st removed, 2nd remains, 3rd removed.. and so on), and replace the empty space with a duplicate from the row which was before it (the undeleted row) at the same time.
Is it possible in numpy?

One approach -
a[::2].repeat(2,axis=0)
To make the changes in the array, assign it back.
Sample run -
In [105]: a
Out[105]:
array([[2, 5, 1, 1],
[2, 0, 2, 5],
[1, 1, 5, 7],
[0, 7, 1, 8],
[8, 5, 2, 3],
[2, 1, 0, 6],
[5, 6, 1, 6],
[7, 1, 4, 7],
[3, 8, 1, 4],
[5, 8, 8, 8]])
In [106]: a[::2].repeat(2,axis=0)
Out[106]:
array([[2, 5, 1, 1],
[2, 5, 1, 1],
[1, 1, 5, 7],
[1, 1, 5, 7],
[8, 5, 2, 3],
[8, 5, 2, 3],
[5, 6, 1, 6],
[5, 6, 1, 6],
[3, 8, 1, 4],
[3, 8, 1, 4]])
If we care about performance, here's another approach using NumPy strides -
def strided_app(a):
m0,n0 = a.strides
m,n = a.shape
strided = np.lib.stride_tricks.as_strided
return strided(a,shape=(m//2,2,n),strides=(2*m0,0,n0)).reshape(-1,n)
Sample run -
In [154]: a
Out[154]:
array([[4, 8, 7, 7],
[5, 5, 1, 7],
[1, 8, 1, 3],
[6, 6, 5, 6],
[0, 2, 6, 3],
[6, 6, 8, 7],
[7, 6, 8, 1],
[7, 8, 8, 2],
[4, 0, 2, 8],
[5, 8, 1, 4]])
In [155]: strided_app(a)
Out[155]:
array([[4, 8, 7, 7],
[4, 8, 7, 7],
[1, 8, 1, 3],
[1, 8, 1, 3],
[0, 2, 6, 3],
[0, 2, 6, 3],
[7, 6, 8, 1],
[7, 6, 8, 1],
[4, 0, 2, 8],
[4, 0, 2, 8]])
Timings -
In [156]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-1
In [157]: %timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 1.26 ms per loop
# #Psidom 's soln
In [158]: %timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 928 µs per loop
In [159]: arr = np.arange(1000000).reshape(1000, 1000)
# Proposed soln-2
In [160]: %timeit strided_app(arr)
1000 loops, best of 3: 830 µs per loop

Looks like you have an even number of rows, in which case, you can use assignment (assign the odd rows values to corresponding even rows):
arr = np.array([[1,4],[3,1],[2,3],[2,2]])
arr[1::2] = arr[::2]
arr
#array([[1, 4],
# [1, 4],
# [2, 3],
# [2, 3]])
This avoids copying the entire array, but doesn't work if the array has odd number of rows.
Timing: Here is a comparison of the timing, the assignment does seem faster.
arr = np.arange(1000000).reshape(1000, 1000)
%timeit arr[::2].repeat(2,axis=0)
1000 loops, best of 3: 913 µs per loop
%timeit arr[1::2] = arr[::2]
1000 loops, best of 3: 655 µs per loop

This works for both even and an odd number of rows.
for i in range(1,len(a),2):
a[i] = a[i-1]

Related

Python - Reshape matrix by taking n consecutive rows every n rows

There is a bunch of questions regarding reshaping of matrices using NumPy here on stackoverflow. I have found one that is closely related to what I am trying to achieve. However, this answer is not general enough for my application. So here we are.
I have got a matrix with millions of lines (shape m x n) that looks like this:
[[0, 0, 0, 0],
[1, 1, 1, 1],
[2, 2, 2, 2],
[3, 3, 3, 3],
[4, 4, 4, 4],
[5, 5, 5, 5],
[6, 6, 6, 6],
[7, 7, 7, 7],
[...]]
From this I would like to go to a shape m/2 x 2n like it can be seen below. For that one has to take n consecutive rows every n rows (in this example n = 2). The blocks of consecutively taken rows are then horizontally stacked to the untouched rows. In this example that would mean:
The first two rows stay like they are.
Take row two and three and horizontally concatenate them to row zero and one.
Take row six and seven and horizontally concatenate them to row four and five. This concatenated block then becomes row two and three.
...
[[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7],
[...]]
How would I most efficiently (in terms of the least computation time possible) do that using Numpy? And would it make sense to speed the process up using Numba? Or is there not much to speed up?

Assuming your array's length is divisible by 4, here one way you can do it using numpy.hstack after creating the correct indices for selecting the rows for the "left" and "right" parts of the resulting array:
import numpy
# Create the array
N = 1000*4
a = np.hstack([np.arange(0, N)[:, None]]*4) #shape (4000, 4)
a
array([[ 0, 0, 0, 0],
[ 1, 1, 1, 1],
[ 2, 2, 2, 2],
...,
[3997, 3997, 3997, 3997],
[3998, 3998, 3998, 3998],
[3999, 3999, 3999, 3999]])
left_idx = np.array([np.array([0,1]) + 4*i for i in range(N//4)]).reshape(-1)
right_idx = np.array([np.array([2,3]) + 4*i for i in range(N//4)]).reshape(-1)
r = np.hstack([a[left_idx], a[right_idx]]) #shape (2000, 8)
r
array([[ 0, 0, 0, ..., 2, 2, 2],
[ 1, 1, 1, ..., 3, 3, 3],
[ 4, 4, 4, ..., 6, 6, 6],
...,
[3993, 3993, 3993, ..., 3995, 3995, 3995],
[3996, 3996, 3996, ..., 3998, 3998, 3998],
[3997, 3997, 3997, ..., 3999, 3999, 3999]])

Here's an application of the swapaxes answer in your link.
In [11]: x=np.array([[0, 0, 0, 0],
...: [1, 1, 1, 1],
...: [2, 2, 2, 2],
...: [3, 3, 3, 3],
...: [4, 4, 4, 4],
...: [5, 5, 5, 5],
...: [6, 6, 6, 6],
...: [7, 7, 7, 7]])
break the array into 'groups' with a reshape, keeping the number of columns (4) unchanged.
In [17]: x.reshape(2,2,2,4)
Out[17]:
array([[[[0, 0, 0, 0],
[1, 1, 1, 1]],
[[2, 2, 2, 2],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[5, 5, 5, 5]],
[[6, 6, 6, 6],
[7, 7, 7, 7]]]])
swap the 2 middle dimensions, regrouping rows:
In [18]: x.reshape(2,2,2,4).transpose(0,2,1,3)
Out[18]:
array([[[[0, 0, 0, 0],
[2, 2, 2, 2]],
[[1, 1, 1, 1],
[3, 3, 3, 3]]],
[[[4, 4, 4, 4],
[6, 6, 6, 6]],
[[5, 5, 5, 5],
[7, 7, 7, 7]]]])
Then back to the target shape. This final step creates a copy of the original (the previous steps were view):
In [19]: x.reshape(2,2,2,4).transpose(0,2,1,3).reshape(4,8)
Out[19]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[1, 1, 1, 1, 3, 3, 3, 3],
[4, 4, 4, 4, 6, 6, 6, 6],
[5, 5, 5, 5, 7, 7, 7, 7]])
It's hard to generalize this, since there are different ways of rearranging blocks. For example my first try produced:
In [16]: x.reshape(4,2,4).transpose(1,0,2).reshape(4,8)
Out[16]:
array([[0, 0, 0, 0, 2, 2, 2, 2],
[4, 4, 4, 4, 6, 6, 6, 6],
[1, 1, 1, 1, 3, 3, 3, 3],
[5, 5, 5, 5, 7, 7, 7, 7]])

Dot product columns by rows python - numpy

I am trying to take out the dot product of each row against itself in a nx3 vector. Let me explain a little better: what I need is to go from a nx3 to a nx3x3 array.
If i have the following:
A = np.array([[1, 2, 2],
[4, 2, 3])
I would like to get what it would be:
First element:
np.dot(A[0].reshape(3,1), A[0].reshape(1,3)) = array([[1, 2, 2], [2, 4, 4], [2, 4, 4]])
Second element:
np.dot(A[1].reshape(3,1), A[1].reshape(1,3)) = array([[16, 8, 12], [8, 4, 6], [12, 6, 9]])
So my final array would be:
result = array([[[ 1, 2, 2],
[ 2, 4, 4],
[ 2, 4, 4]],
[[16, 8, 12],
[ 8, 4, 6],
[12, 6, 9]])
result.shape = (2, 3, 3)
I know I can do this with a for loop but I guess there must be a way to do it faster and more directly. Speed is vital for what I need.
Hope I explained myself correctly enough. Thank you in advance.

In [301]: A = np.array([[1, 2, 2],
...: [4, 2, 3]])
...:
...:
This isn't a dot product; there's no summing of products. Rather it's more like an outer product, increasing the number of dimensions. numpy with broadcasting does this nicely:
In [302]: A[:,:,None]*A[:,None,:]
Out[302]:
array([[[ 1, 2, 2],
[ 2, 4, 4],
[ 2, 4, 4]],
[[16, 8, 12],
[ 8, 4, 6],
[12, 6, 9]]])

Pandas/Numpy Get matrix from column of arrays

I have a pandas dataframe with a column of lists.
df:
inputs
0 [1, 2, 3]
1 [4, 5, 6]
2 [7, 8, 9]
3 [10, 11, 12]
I need the matrix
array([[ 1, 2, 3],
[ 4, 5, 6],
[ 7, 8, 9],
[10, 11, 12]])
An efficient way to do this?
Note: When I try df.inputs.as_matrix() the output is
array([[1, 2, 3], [4, 5, 6], [7, 8, 9], [10, 11, 12]], dtype=object)
which has shape (4,), not (4,3) as desired.

You can convert the column to list and then apply numpy array, if all the lists in the column have the same length, this will make a 2D array:
arr = np.array(df.inputs.tolist())
#array([[ 1, 2, 3],
# [ 4, 5, 6],
# [ 7, 8, 9],
# [10, 11, 12]])
arr.shape
# (4, 3)
Or another option use .values to access the numpy object firstly and then convert it to list as commented by #piRSquared, this is marginally faster with the example given:
%timeit df.inputs.values.tolist()
# 100000 loops, best of 3: 5.52 µs per loop
%timeit df.inputs.tolist()
# 100000 loops, best of 3: 11.5 µs per loop

Numpy - create matrix with rows of vector

I have a vector [x,y,z,q] and I want to create a matrix:
[[x,y,z,q],
[x,y,z,q],
[x,y,z,q],
...
[x,y,z,q]]
with m rows. I think this could be done in some smart way, using broadcasting, but I can only think of doing it with a for loop.

Certainly possible with broadcasting after adding with m zeros along the columns, like so -
np.zeros((m,1),dtype=vector.dtype) + vector
Now, NumPy already has an in-built function np.tile for exactly that same task -
np.tile(vector,(m,1))
Sample run -
In [496]: vector
Out[496]: array([4, 5, 8, 2])
In [497]: m = 5
In [498]: np.zeros((m,1),dtype=vector.dtype) + vector
Out[498]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
In [499]: np.tile(vector,(m,1))
Out[499]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
You can also use np.repeat after extending its dimension with np.newaxis/None for the same effect, like so -
In [510]: np.repeat(vector[None],m,axis=0)
Out[510]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
You can also use integer array indexing to get the replications, like so -
In [525]: vector[None][np.zeros(m,dtype=int)]
Out[525]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
And finally with np.broadcast_to, you can simply create a 2D view into the input vector and as such this would be virtually free and with no extra memory requirement. So, we would simply do -
In [22]: np.broadcast_to(vector,(m,len(vector)))
Out[22]:
array([[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2],
[4, 5, 8, 2]])
Runtime test -
Here's a quick runtime test comparing the various approaches -
In [12]: vector = np.random.rand(10000)
In [13]: m = 10000
In [14]: %timeit np.broadcast_to(vector,(m,len(vector)))
100000 loops, best of 3: 3.4 µs per loop # virtually free!
In [15]: %timeit np.zeros((m,1),dtype=vector.dtype) + vector
10 loops, best of 3: 95.1 ms per loop
In [16]: %timeit np.tile(vector,(m,1))
10 loops, best of 3: 89.7 ms per loop
In [17]: %timeit np.repeat(vector[None],m,axis=0)
10 loops, best of 3: 86.2 ms per loop
In [18]: %timeit vector[None][np.zeros(m,dtype=int)]
10 loops, best of 3: 89.8 ms per loop

Sub matrix of a list of lists (without numpy)

Suppose I have a matrix composed of a list of lists like so:
>>> LoL=[list(range(10)) for i in range(10)]
>>> LoL
[[0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]]
Assume, also, that I have a numpy matrix of the same structure called LoLa:
>>> LoLa=np.array(LoL)
Using numpy, I could get a submatrix of this matrix like this:
>>> LoLa[1:4,2:5]
array([[2, 3, 4],
[2, 3, 4],
[2, 3, 4]])
I can replicate the numpy matrix slice in pure Python like so:
>>> r=(1,4)
>>> s=(2,5)
>>> [LoL[i][s[0]:s[1]] for i in range(len(LoL))][r[0]:r[1]]
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Which is not the easiest thing in the world to read nor the most efficient :-)
Question: Is there an easier way (in pure Python) to slice an arbitrary matrix as a sub matrix?

In [74]: [row[2:5] for row in LoL[1:4]]
Out[74]: [[2, 3, 4], [2, 3, 4], [2, 3, 4]]
You could also mimic NumPy's syntax by defining a subclass of list:
class LoL(list):
def __init__(self, *args):
list.__init__(self, *args)
def __getitem__(self, item):
try:
return list.__getitem__(self, item)
except TypeError:
rows, cols = item
return [row[cols] for row in self[rows]]
lol = LoL([list(range(10)) for i in range(10)])
print(lol[1:4, 2:5])
also yields
[[2, 3, 4], [2, 3, 4], [2, 3, 4]]
Using the LoL subclass won't win any speed tests:
In [85]: %timeit [row[2:5] for row in x[1:4]]
1000000 loops, best of 3: 538 ns per loop
In [82]: %timeit lol[1:4, 2:5]
100000 loops, best of 3: 3.07 us per loop
but speed isn't everything -- sometimes readability is more important.

For one, you can use slice objects directly, which helps a bit with both the readability and performance:
r = slice(1,4)
s = slice(2,5)
[LoL[i][s] for i in range(len(LoL))[r]]
And if you just iterate over the list-of-lists directly, you can write that as:
[row[s] for row in LoL[r]]

Do this,
submat = [ [ mat[ i ][ j ] for j in range( index1, index2 ) ] for i in range( index3, index4 ) ]
the submat will be the rectangular (square if index3 == index1 and index2 == index4) chunk of your original big matrix.

I dont know if its easier, but let me throw an idea to the table:
from itertools import product
r = (1+1, 4+1)
s = (2+1, 5+1)
array = [LoL[i][j] for i,j in product(range(*r), range(*s))]
This is a flattened version of the submatrix you want.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Delete and duplicate rows in numpy array - python

This works for both even and an odd number of rows. for i in range(1,len(a),2): a[i] = a[i-1]

Related

Python - Reshape matrix by taking n consecutive rows every n rows

Dot product columns by rows python - numpy

Pandas/Numpy Get matrix from column of arrays

Numpy - create matrix with rows of vector

Sub matrix of a list of lists (without numpy)

Categories

Resources