I have this
rows = self.rows()
aaa = []
for r in range(0, 9, 3):
bbb = []
for c in range(0, 9, 3):
ccc = []
for s in range(3):
ccc.extend(rows[r+s][c:c+3])
bbb.append(ccc)
aaa.append(bbb)
and it returns this
[
[
[5, 0, 0, 0, 6, 0, 0, 2, 9],
[3, 8, 0, 4, 9, 2, 0, 0, 6],
[0, 6, 2, 1, 0, 0, 3, 0, 4]
],
[
[0, 7, 6, 0, 0, 8, 0, 4, 0],
[0, 4, 0, 2, 0, 5, 0, 3, 1],
[0, 3, 1, 0, 4, 9, 0, 0, 0]
],
[
[4, 0, 0, 6, 0, 3, 0, 0, 1],
[0, 0, 0, 7, 0, 0, 0, 0, 0],
[0, 5, 3, 2, 0, 8, 0, 0, 6]
]
]
which is correct.
rows is just a list containing 9 other nested lists, each with exactly 9 integers ranging 0 through 9.
When I try to use list comprehension
[[rows[r+s][c:c+3] for s in range(3) for c in range(0, 9, 3)] for r in range(0, 9, 3)]
I get this
[
[
[5, 0, 0],
[3, 8, 0],
[0, 6, 2],
[0, 6, 0],
[4, 9, 2],
[1, 0, 0],
[0, 2, 9],
[0, 0, 6],
[3, 0, 4]
],
[
[0, 7, 6],
[0, 4, 0],
[0, 3, 1],
[0, 0, 8],
[2, 0, 5],
[0, 4, 9],
[0, 4, 0],
[0, 3, 1],
[0, 0, 0]
],
[
[4, 0, 0],
[0, 0, 0],
[0, 5, 3],
[6, 0, 3],
[7, 0, 0],
[2, 0, 8],
[0, 0, 1],
[0, 0, 0],
[0, 0, 6]
]
]
Clearly I'm doing something wrong, but I can't see what? I've checked other SO questions and they allude to structuring the LC a certain way to stop the innermost list splitting into 9 separate lists, but so far, it's not happening.
Try:
import itertools
[[list(itertools.chain(*[rows[r+s][c:c+3] for s in range(3)])) for c in range(0, 9, 3)] for r in range(0, 9, 3)]
This will give you what you want. This version uses bare Python with no libraries, but seems to be the most concise (of the solutions that work):
[[[rows[r+s][c+i] for s in range(3) for i in range(3)] for c in range(0, 9, 3)] for r in range(0, 9, 3)]
Formatted more readably, this is just:
[[[rows[r+s][c+i] for s in range(3)
for i in range(3)]
for c in range(0, 9, 3)]
for r in range(0, 9, 3)]
This replaces the slice in the original loop nest with an inner i loop, eliminating the need to concatenate intermediate slices (or to generate them).
You can do something like this,
print([[rows[r+s][c:c+3] * 3 for c in range(0, 9, 3)] for r in range(0, 9, 3)])
Related
Suppose we have a numpy array of numpy arrays of zeros as
arr1=np.zeros((len(Train),(L))
where Train is a (dataset) numpy array of arrays of integers of fixed length.
We also have another 1d numpy array, positions of length as len(Train).
Now we wish to add elements of Train to arr1 at the positions specified by positions.
One way is to use a for loop on the Train array as:
k=len(Train[0])
for i in range(len(Train)):
arr1[i,int(positions[i]):int((positions[i]+k))]=Train[i,0:k])]
However, going over the entire Train set using the explicit for loop is slow and I would like to optimize it.
Here is one way by generating all the indexes you want to assign to. Setup:
import numpy as np
n = 12 # Number of training samples
l = 8 # Number of columns in the output array
k = 4 # Number of columns in the training samples
arr = np.zeros((n, l), dtype=int)
train = np.random.randint(10, size=(n, k))
positions = np.random.randint(l - k, size=n)
Random example data:
>>> train
array([[3, 4, 3, 2],
[3, 6, 4, 1],
[0, 7, 9, 6],
[4, 0, 4, 8],
[2, 2, 6, 2],
[4, 5, 1, 7],
[5, 4, 4, 4],
[0, 8, 5, 3],
[2, 9, 3, 3],
[3, 3, 7, 9],
[8, 9, 4, 8],
[8, 7, 6, 4]])
>>> positions
array([3, 2, 3, 2, 0, 1, 2, 2, 3, 2, 1, 1])
Advanced indexing with broadcasting trickery:
rows = np.arange(n)[:, None] # Shape (n, 1)
cols = np.arange(k) + positions[:, None] # Shape (n, k)
arr[rows, cols] = train
output:
>>> arr
array([[0, 0, 0, 3, 4, 3, 2, 0],
[0, 0, 3, 6, 4, 1, 0, 0],
[0, 0, 0, 0, 7, 9, 6, 0],
[0, 0, 4, 0, 4, 8, 0, 0],
[2, 2, 6, 2, 0, 0, 0, 0],
[0, 4, 5, 1, 7, 0, 0, 0],
[0, 0, 5, 4, 4, 4, 0, 0],
[0, 0, 0, 8, 5, 3, 0, 0],
[0, 0, 0, 2, 9, 3, 3, 0],
[0, 0, 3, 3, 7, 9, 0, 0],
[0, 8, 9, 4, 8, 0, 0, 0],
[0, 8, 7, 6, 4, 0, 0, 0]])
I have a quite large m times n numpy matrix M filled with non-zero values and an array x of length m, where each entry indicates the row index, after which the matrix elements should be set to zero. So for example, if n=5 and x[i]=3, then the i-th row of the matrix be set to [M_i1, M_i2, M_i3, 0, 0].
If all entries of x had the same value k, I could simply use slicing with something like M[:,k:]=0, but I could not figure out an efficient way to this with different values for each row without looping over all rows and use slicing for each row.
I thougt about creating a matrix that looks like [[1]*x[1] + [0]*(n-x[1]),...,[1]*x[m] + [0]*(n-x[m])] and use it for boolean indexing but also don't know how to create this without looping.
The non-vectorized solution looks like this:
for i in range(m):
if x[i] < n:
M[i,x[i]:] = 0
with example input
M = np.array([[1,2,3],[4,5,6]])
m, n = 2, 3
x = np.array([1,2])
and output
array([[1, 0, 0],
[4, 5, 0]])
Does anyone have a vectorized solution for this problem?
Thank you very much!
You can use multi-dimensional boolean indexing:
M[x[:,None]<=np.arange(M.shape[1])] = 0
example:
M = [[7, 8, 4, 2, 3, 9, 1, 8, 4, 3],
[2, 1, 6, 1, 5, 2, 2, 2, 9, 2],
[6, 1, 6, 8, 4, 3, 6, 9, 2, 6],
[5, 4, 0, 8, 3, 0, 0, 1, 8, 7],
[8, 7, 8, 8, 9, 2, 0, 8, 0, 2]]
x = [4, 4, 0, 6, 2]
output:
[[7, 8, 4, 2, 0, 0, 0, 0, 0, 0],
[2, 1, 6, 1, 0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
[5, 4, 0, 8, 3, 0, 0, 0, 0, 0],
[8, 7, 0, 0, 0, 0, 0, 0, 0, 0]]
This looks like a mask-smearing exercise. At each row, you want to smear starting with the element at np.minimum(x[row], n):
mask = np.zeros(M.shape, bool)
mask[np.flatnonzero(x < n), x[x < n]] = True
M[np.cumsum(mask, axis=1, dtype=bool)] = 0
Let's say I have this numpy array:
array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
And going row by row, I would like to see, by column, the cumulative count of the number that appears. So for this array, the result would be:
array([[0, 0, 0, 0, 0, 0], # (*0)
[0, 0, 0, 0, 0, 0], # (*1)
[0, 0, 0, 1, 1, 1], # (*2)
[0, 0, 0, 2, 0, 0], # (*3)
[0, 1, 0, 1, 1, 0]] # (*4)
(*0): first time each value appears
(*1): all values are different from the previous one (in the column)
(*2): For the last 3 columns, a 1 appears because there is already 1 value repetition.
(*3): For the 4th column, a 2 appears because it's the 3rd time that a 8 appears.
(*4): In the 4th column, a 1 appears because it's the 2nd time that a 9 appears in that column. Similarly, for the second and second to last column.
Any idea how to perform this?
Thanks!
Maybe there is a faster way using numpy ufuncs, however here is a solution using standard python:
from collections import defaultdict
import numpy as np
a = np.array([[4, 5, 6, 8, 5, 6],
[5, 1, 1, 9, 0, 5],
[7, 0, 5, 8, 0, 5],
[9, 2, 3, 8, 2, 3],
[1, 2, 2, 9, 2, 8]])
# define function
def get_count(array):
count = []
for row in array.T:
occurences = defaultdict(int)
rowcount = []
for n in row:
occurences[n] += 1
rowcount.append(occurences[n] - 1)
count.append(rowcount)
return np.array(count).T
Output:
>>> get_count(a)
array([[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 1, 1, 1],
[0, 0, 0, 2, 0, 0],
[0, 1, 0, 1, 1, 0]])
So I have a very large two-dimensional numpy array such as:
array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
I would like to quickly remove each row of the array where np.sum(row[2:5]) == 0
The only way I can think to do this is with for loops, but that takes very long when there are millions of rows. Additionally, this needs to be constrained to Python 2.7
Boolean expressions can be used as an index. You can use them to mask the array.
inputarray = array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],
...,
[ 6, 5, 6, 0, 0, 1, 9, 5]])
mask = numpy.sum(inputarray[:,2:5], axis=1) != 0
result = inputarray[mask,:]
What this is doing:
inputarray[:, 2:5] selects all the columns you want to sum over
axis=1 means we're doing the sum on the columns
We want to keep the rows where the sum is not zero
The mask is used as a row index and selects the rows where the boolean expression is True
Another solution would be to use numpy.apply_along_axis to calculate the sums and cast it as a bool, and use that for your index:
my_arr = np.array([[ 2, 4, 0, 0, 0, 5, 9, 0],
[ 2, 3, 0, 1, 0, 3, 1, 1],
[ 1, 5, 4, 3, 2, 7, 8, 3],
[ 0, 7, 0, 0, 0, 6, 4, 4],])
my_arr[np.apply_along_axis(lambda x: bool(sum(x[2:5])), 1, my_arr)]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3]])
We just cast the sum too a bool since any number that's not 0 is going to be True.
>>> a
array([[2, 4, 0, 0, 0, 5, 9, 0],
[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[0, 7, 0, 0, 0, 6, 4, 4],
[6, 5, 6, 0, 0, 1, 9, 5]])
You are interested in columns 2 through five
>>> a[:,2:5]
array([[0, 0, 0],
[0, 1, 0],
[4, 3, 2],
[0, 0, 0],
[6, 0, 0]])
>>> b = a[:,2:5]
You want to find the sum of those columns in each row
>>> sum_ = b.sum(1)
>>> sum_
array([0, 1, 9, 0, 6])
These are the rows that meet your criteria
>>> sum_ != 0
array([False, True, True, False, True], dtype=bool)
>>> keep = sum_ != 0
Use boolean indexing to select those rows
>>> a[keep, :]
array([[2, 3, 0, 1, 0, 3, 1, 1],
[1, 5, 4, 3, 2, 7, 8, 3],
[6, 5, 6, 0, 0, 1, 9, 5]])
>>>
in matlab/ GNU Octave( which i am actually using ), I use this method to copy particular elements of a 2D array to another 2D array:
B(2:6, 2:6) = A
where
size(A) = (5, 5)
My question is, "How can this be achieved in python using numpy?"
currently, for example, I am using the following nested loop in python:
>>> import numpy as np
>>> a = np.int32(np.random.rand(5,5)*10)
>>> b = np.zeros((6,6), dtype = np.int32)
>>> print a
[[6 7 5 1 3]
[3 9 7 2 0]
[9 3 7 6 7]
[9 8 2 0 8]
[8 7 7 9 9]]
>>> print b
[[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0],
[0, 0, 0, 0, 0, 0]]
>>> for i in range(1,6):
for j in range(1,6):
b[i][j] = a[i-1][j-1]
>>> print b
[[0, 0, 0, 0, 0, 0],
[0, 6, 7, 5, 1, 3],
[0, 3, 9, 7, 2, 0],
[0, 9, 3, 7, 6, 7],
[0, 9, 8, 2, 0, 8],
[0, 8, 7, 7, 9, 9]]
Is there a better way to do this?
It's almost the same as the MATLAB:
b[1:6, 1:6] = a
The only thing is that Python uses 0-based indexing so the second element is 1 instead of 2.