I have 50ish relatively large sparse arrays (in scipy.csr_array format but that can be changed) and I would like to insert rows and columns of zeros at certain locations. An example in dense format would look like:
A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
# A = array([[1,2,1],
# [2,4,5],
# [2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])
# indices = array([-1, -1, 2, -1, 4, -1, -1, 7, -1])
#insert rows and colums of zeros where indices[i] == -1 to get B
B = np.asarray([[0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0],
[0,0,1,0,2,0,0,1,0],
[0,0,0,0,0,0,0,0,0],
[0,0,2,0,4,0,0,5,0],
[0,0,0,0,0,0,0,0,0],
[0,0,0,0,0,0,0,0,0],
[0,0,2,0,1,0,0,6,0],
[0,0,0,0,0,0,0,0,0]])
A is a sparse array of shape (~2000, ~2000) with ~20000 non zero entries and indices is of shape (4096, ). I can imagine doing it in dense format but I guess I don't know enough about the way data and indices are are stored and cannot find a way to do this sort of operation for sparse arrays in a quick and efficient way.
Anyone have any ideas or suggestions?
Thanks.
I would probably do this by passing the data and associated indices into a COO matrix constructor:
import numpy as np
from scipy.sparse import coo_matrix
A = np.asarray([[1,2,1],[2,4,5],[2,1,6]])
indices = np.asarray([-1, -1, 2, -1, 4, -1, -1, 7, -1])
idx = indices[indices >= 0]
col, row = np.meshgrid(idx, idx)
mat = coo_matrix((A.ravel(), (row.ravel(), col.ravel())),
shape=(len(indices), len(indices)))
print(mat)
# (2, 2) 1
# (2, 4) 2
# (2, 7) 1
# (4, 2) 2
# (4, 4) 4
# (4, 7) 5
# (7, 2) 2
# (7, 4) 1
# (7, 7) 6
print(mat.todense())
# [[0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0]
# [0 0 1 0 2 0 0 1 0]
# [0 0 0 0 0 0 0 0 0]
# [0 0 2 0 4 0 0 5 0]
# [0 0 0 0 0 0 0 0 0]
# [0 0 0 0 0 0 0 0 0]
# [0 0 2 0 1 0 0 6 0]
# [0 0 0 0 0 0 0 0 0]]
You could try storing your non-zero values in one list and their respective indexes in another:
data_list = [[], [], [1, 2, 1], [], [2, 4, 5], [], [], [2, 1, 6], []]
index_list = [[], [], [2, 4, 7], [], [2, 4, 7], [], [], [2, 4, 7], []]
These two lists, would only then have to store the number of nonzero values each, rather than one list with 4,000,000 values.
If you then wanted to grab the value in position (4, 7):
def find_value(row, col):
# Check to see if the given column is in our index list
if col not in index_list[row]:
return 0
# Otherwise return the number in the data list
myNum = data_list[row][index_list[row].index(col)]
return myNum
find_value(4, 7)
output: 5
Hope this helps!
Related
I would like to get the following foreach row, the column indices where the column value > 0. If possible a vectorized approach. An example data frame
c1 c2 c3 c4 c5 c6 c7 c8 c9
1 1 0 0 0 0 0 0 0
1 0 0 0 0 0 0 0 0
0 1 0 0 0 0 0 0 0
1 5 5 0 0 1 0 4 6
The output is expected to be
[0, 1]
[0]
[1]
[0, 1, 2, 5, 7, 8]
One quick option is to apply numpy.flatnonzero to each row:
import numpy as np
df.apply(np.flatnonzero, axis=1)
0 [0, 1]
1 [0]
2 [1]
3 [0, 1, 2, 5, 7, 8]
dtype: object
If you care about performance, here is a pure numpy option (caveat for this option is if the row doesn't have any non zero values, it will be ignored in the result. Choose the method that works for you depending on your need):
idx, idy = np.where(df != 0)
np.split(idy, np.flatnonzero(np.diff(idx) != 0) + 1)
[array([0, 1], dtype=int32),
array([0], dtype=int32),
array([1], dtype=int32),
array([0, 1, 2, 5, 7, 8], dtype=int32)]
Not as sexy as #Psidom's answer, but still, here is a solution using numpy.argwhere
import numpy as np
pd.DataFrame(np.argwhere(df.gt(0).values)).groupby(0)[1].apply(list)
output:
0 [0, 1]
1 [0]
2 [1]
3 [0, 1, 2, 5, 7, 8]
Just for fun, here is a pandas version:
s = df.set_axis(range(len(df.columns)), axis=1).stack()
s[s.gt(0)].reset_index(level=1)['level_1'].groupby(level=0).apply(list)
I know that you can move an array with NumPy so if you use np.roll you can shift array to right or to the left. I was wondering how to move a specific set of values with in the array to either left right up or down.
for example
if I wanted to move what is circled in red to the left how would i be able to move that and nothing else?
numpy can use slice to get subarray and later assing it in different place
import numpy as np
x = [
[0, 1, 2, 1, 0, 0, 1, 2, 1, 0 ],
[0, 1, 2, 1, 0, 0, 1, 2, 1, 0 ]
]
arr = np.array(x)
print(arr)
subarr = arr[0:2,1:4] # get values
print(subarr)
arr[0:2,0:3] = subarr # put in new place
print(arr)
Result:
[[0 1 2 1 0 0 1 2 1 0]
[0 1 2 1 0 0 1 2 1 0]]
[[1 2 1]
[1 2 1]]
[[1 2 1 1 0 0 1 2 1 0]
[1 2 1 1 0 0 1 2 1 0]]
It keeps original values in [0][1], [1][1]. If you want remove them then you could copy subarray, set zero in original place, and put copy in new place
import numpy as np
x = [
[0, 1, 2, 1, 0, 0, 1, 2, 1, 0 ],
[0, 1, 2, 1, 0, 0, 1, 2, 1, 0 ]
]
arr = np.array(x)
print(arr)
subarr = arr[0:2,1:4].copy() # duplicate values
print(subarr)
arr[0:2,1:4] = 0 # remove original values
arr[0:2,0:3] = subarr # put in new place
print(arr)
Result
[[0 1 2 1 0 0 1 2 1 0]
[0 1 2 1 0 0 1 2 1 0]]
[[1 2 1]
[1 2 1]]
[[1 2 1 0 0 0 1 2 1 0]
[1 2 1 0 0 0 1 2 1 0]]
I use the following function in order to find the consecutive negative and positive numbers, now I also want to add a condition that gets the consecutive zeros as well.
How can I do that?
def consecutive_counts(arr):
'''
Returns number of consecutive negative and positive numbers
arr = np.array
negative = consecutive_counts()[0]
positive = consecutive_counts()[1]
'''
pos = arr > 0
# is used to Compute indices that are non-zero in the flattened version of arr
idx = np.flatnonzero(pos[1:] != pos[:-1])
count = np.concatenate(([idx[0]+1], idx[1:] - idx[:-1], [arr.size-1-idx[-1]]))
negative = count[1::2], count[::2]
positive = count[::2], count[1::2]
if arr[0] < 0:
return negative
else:
return positive
this is a pandas series:
In [221]: n.temp.p['50000']
Out[221]:
name
0 0.00
1 -92.87
2 -24.01
3 -92.87
4 -92.87
5 -92.87
... ...
which i use it like this:
arr = n.temp.p['50000'].values #Will be a numpy array as the input
expected output:
In [225]: consecutive_counts(a)
Out[225]: (array([30, 29, 11, ..., 2, 1, 3]), array([19, 1, 1, ..., 1, 1, 2]))
Thanks :)
Since you tagged pandas here's one approach:
# random data
np.random.seed(1)
a = np.random.choice(range(-2,3), 1000)
# np.sign: + = 1, 0 = 0, - = -1
b = pd.Series(np.sign(a))
# b.head()
# 0 1
# 1 1
# 2 -1
# 3 -1
# 4 1
# dtype: int32
# sign blocks
blks = b.diff().ne(0).cumsum()
# blks.head()
# 0 1
# 1 1
# 2 2
# 3 2
# 4 3
# dtype: int32
# number of blocks:
blks.iloc[-1]
# 654
# block counts:
blks.value_counts()
# 1 2
# 2 2
# 3 1
# 4 3
# 5 2
# ...
Here is a numpy approach:
# create example
arr = np.random.randint(-2,3,(10))
# split into negative, zero, positive
*nzp, = map(np.flatnonzero,(arr<0,arr==0,arr>0))
# find block boundaries
*bb, = (np.flatnonzero(np.diff(x,prepend=-2,append=-2)-1) for x in nzp)
# compute block sizes
*bs, = map(np.diff,bb)
# show all
for data in (arr,nzp,bb,bs): print(data)
# [-1 1 -1 1 0 0 2 -1 -2 1]
# [array([0, 2, 7, 8]), array([4, 5]), array([1, 3, 6, 9])]
# [array([0, 1, 2, 4]), array([0, 2]), array([0, 1, 2, 3, 4])]
# [array([1, 1, 2]), array([2]), array([1, 1, 1, 1])]
I want to find a list of points that are within range 1 (or exactly diagonal) of a point in my numpy matrix:
For example say my matrix m is:
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 1 0 0]
[0 0 0 0 0]
[0 0 0 0 0]]
I would like to obtain a list of tuples or something representing all the coordinates of the 9 points with X's below:
[[0 0 0 0 0]
[0 X X X 0]
[0 X X X 0]
[0 X X X 0]
[0 0 0 0 0]]
Here is another example with the target point on the edge:
[[0 0 0 0 0]
[0 0 0 0 0]
[0 0 0 0 1]
[0 0 0 0 0]
[0 0 0 0 0]]
In this case there would only 6 points within distance 1 of the target point:
[[0 0 0 0 0]
[0 0 0 X X]
[0 0 0 X X]
[0 0 0 X X]
[0 0 0 0 0]]
EDIT:
Using David Herrings answer/comment about chebyshev distance here is my attempt to solve example 2 above assuming I know the coordinates of the target point:
from scipy.spatial import distance
point = [2, 4]
valid_points = []
for x in range(5):
for y in range(5):
if(distance.chebyshev(point, [x,y]) <= 1):
valid_points.append([x,y])
print(valid_points) # [[1, 3], [1, 4], [2, 3], [2, 4], [3, 3], [3, 4]]
This seems a little inefficient for a bigger array as I only need to check a small set of cells really not the whole martix.
I think you're making it a little too complicated - no need to rely on complicated functions
import numpy as np
# set up matrix
x = np.zeros((5,5))
# add a single point
x[2,-1] = 1
# get coordinates of point as array
r, c = np.where(x)
# convert to python scalars
r = r[0]
c = c[0]
# get boundaries of array
m, n = x.shape
coords = []
# loop over possible locations
for i in [-1, 0, 1]:
for j in [-1, 0, 1]:
# check if location is within boundary
if 0 <= r + i < m and 0 <= c + j < n:
coords.append((r + i, c + j))
print(coords)
>>> [(1, 3), (1, 4), (2, 3), (2, 4), (3, 3), (3, 4)]
There is no algorithm of interest here. If you don’t already know where the 1 is, first you have to find it, and you can’t do better than searching through every element. (You can get a constant-factor speedup by having numpy do this at C speed with argmax; use divmod to separate the flattened index into row and column.) Thereafter, all you do is add ±1 (or 0) to the coordinates unless it would take you outside the array bounds. You don’t ever construct coordinates only to discard them later.
A simple way would be to get all possible coordinates with a cartesian product
Setup the data:
x = np.array([[0,0,0], [0,1,0], [0,0,0]])
x
array([[0, 0, 0],
[0, 1, 0],
[0, 0, 0]])
You know that the coordinates will be +/- 1 of your location:
loc = np.argwhere(x == 1)[0] # unless already known or pre-specified
v = [loc[0], loc[0]-1, loc[0]+1]
h = [loc[1], loc[1]-1, loc[1]+1]
output = []
for i in itertools.product(v, h):
if not np.any(np.array(i) >= x.shape[0]) and not np.any(np.array(i) < 0): output.append(i)
print(output)
[(1, 1), (1, 0), (1, 2), (0, 1), (0, 0), (0, 2), (2, 1), (2, 0), (2, 2)]
I have matrix similar to this:
1 0 0
1 0 0
0 2 0
0 2 0
0 0 3
0 0 3
(Non-zero numbers denote parts that I'm interested in. Actual number inside matrix could be random.)
And I need to produce vector like this:
[ 1 1 2 2 3 3 ].T
I can do this with loop:
result = np.zeros([rows])
for y in range(rows):
x = y // (rows // cols) # pick index of corresponded column
result[y] = mat[y][x]
But I can't figure out how to do this in vector form.
This might be what you want.
import numpy as np
m = np.array([
[1, 0, 0],
[1, 0, 0],
[0, 2, 0],
[0, 2, 0],
[0, 0, 3],
[0, 0, 3]
])
rows, cols = m.shape
# axis1 indices
y = np.arange(rows)
# axis2 indices
x = y // (rows // cols)
result = m[y,x]
print(result)
Result:
[1 1 2 2 3 3]