So I want to create the sparse matrix as below from the numpy array matrix as usual:
from scipy import sparse
I = np.array([0,1,2, 0,1,2, 0,1,2])
J = np.array([0,0,0,1,1,1,2,2,2])
DataElement = np.array([2,1,2,1,0,1,2,1,2])
A = sparse.coo_matrix((DataElement,(I,J)),shape=(3,3))
print(A.toarray()) ## This is what I expect to see.
My attempt with numpy is:
import numpy as np
U = np.empty((3,3,), order = "F")
U[:] = np.nan
## Initialize
U[0,0] = 2
U[2,0] = 2
U[0,2] = 2
U[2,2] = 2
for j in range(0,3):
## Slice columns first:
if (j !=0 and j!= 2):
for i in range(0,3):
## slice rows:
if (i != 0 and i != 2):
U[i,j] = 0
else:
U[i,j] = 1
One way using numpy.add.at:
arr = np.zeros((3,3), int)
np.add.at(arr, (I, J), DataElement)
print(arr)
Output:
array([[2, 1, 2],
[1, 0, 1],
[2, 1, 2]])
There are several ways of manual filling of the arrays.
First, you can explicitly define each entry:
U = np.array([[2,1,2],[1,0,1],[2,1,2]],order='F')
Or you can initialize an array with nans and then define each element by subscribing them:
U = np.empty((3,3,), order = "F")
U[:] = np.nan
U[0,0],U[0,1],U[0,2]=2,1,2
U[1,0],U[1,1],U[1,2]=1,0,1
U[2,0],U[2,1],U[2,2]=2,1,2
Finally, if there is a pattern, one can slice and define multiple values at once:
U[:,0]=[2,1,2]
U[:,1]=U[:,0]-1
U[:,2]=U[:,0]
In your attempt, you simply miss some of the entries, and they remain nans.
Related
import numpy as np
A = np.empty((0, 3))
temp = np.array([1, 1, 1])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
B = [A]
temp = np.array([2, 2, 0])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
A = np.vstack([A, temp])
B = B.append(A)
So it does not work. How do I make a list of numpy arrays? The problem is that I have N types of points. Every type of points has M number of points. Every point is 3 coordinate array. Because I dont know in the first place the values of N and M, I need to do all the things dynamicly. When I had N = 1, vstack worked perfectly, but now every type has its own M, and array is not uniform anymore. So my guess - I need to work in numpy/vstack just as if I had N = 1, but afterwards just contain this np.empty((0, 3)) arrays somewhere. Is it possible? Maybe some empty object-type dictionary?
Thank you very mush in advance!
I have an n row, m column numpy array, and would like to create a new k x m array by selecting k random elements from each column of the array. I wrote the following python function to do this, but would like to implement something more efficient and faster:
def sample_array_cols(MyMatrix, nelements):
vmat = []
TempMat = MyMatrix.T
for v in TempMat:
v = np.ndarray.tolist(v)
subv = random.sample(v, nelements)
vmat = vmat + [subv]
return(np.array(vmat).T)
One question is whether there's a way to loop over each column without transposing the array (and then transposing back). More importantly, is there some way to map the random sample onto each column that would be faster than having a for loop over all columns? I don't have that much experience with numpy objects, but I would guess that there should be something analogous to apply/mapply in R that would work?
One alternative is to randomly generate the indices first, and then use take_along_axis to map them to the original array:
arr = np.random.randn(1000, 5000) # arbitrary
k = 10 # arbitrary
n, m = arr.shape
idx = np.random.randint(0, n, (k, m))
new = np.take_along_axis(arr, idx, axis=0)
Output (shape):
in [215]: new.shape
out[215]: (10, 500) # (k x m)
To sample each column without replacement just like your original solution
import numpy as np
matrix = np.arange(4*3).reshape(4,3)
matrix
Output
array([[ 0, 1, 2],
[ 3, 4, 5],
[ 6, 7, 8],
[ 9, 10, 11]])
k = 2
np.take_along_axis(matrix, np.random.rand(*matrix.shape).argsort(axis=0)[:k], axis=0)
Output
array([[ 9, 1, 2],
[ 3, 4, 11]])
I would
Pre-allocate the result array, and fill in columns, and
Use numpy index based indexing
def sample_array_cols(matrix, n_result):
(n,m) = matrix.shape
vmat = numpy.array([n_result, m], dtype= matrix.dtype)
for c in range(m):
random_indices = numpy.random.randint(0, n, n_result)
vmat[:,c] = matrix[random_indices, c]
return vmat
Not quite fully vectorized, but better than building up a list, and the code scans just like your description.
I am trying to find the minimum value in an N-dimensional array spanned by (N-Parameters of varying values) and take out a 2-dimensional array spanned by 2 of the (N-Parameters) around the minimum value to make a contour plot.
I can do this by hard coding the different cases, but it should preferably be done using a variable list of which axis should be extracted (contour_param).
Please see the code below for some clarification.
import numpy as np
np.random.seed(10) # seed random for reproducebility
#Example for a 3D input array (my_data)
param_sizes = [2, 3, 4]
#Generate a data_cube
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
#find minimum
min_pos = np.where(my_data == my_data.min())
#what I want:
#define a parameter with the indexs of the axis to be used for the contour plot: e.i. : contour_param = [0, 1]
#for contour_param = [0, 1] i would need the the 2D array:
result = my_data[:, :, min_pos[2][0]]
#for contour_param = [1, 2] i would need the the 2D array:
result = my_data[min_pos[0][0], :, :]
#What I have tried is to convert min_pos to a list and change the entries to arrays:
contour_param = [0, 1]
min_pos = list(np.where(my_data == my_data.min()))
min_pos[contour_param[0]] = np.arange(param_sizes[contour_param[0]])
min_pos[contour_param[1]] = np.arange(param_sizes[contour_param[1]])
result = my_data[min_pos] #This throws an error
#In an attempt to clarify - I have included a sample for a 4D array
#Example for a 4D array
param_sizes = [2, 3, 4, 3]
#Generate a data_cube
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
#find minimum
min_pos = np.where(my_data == my_data.min())
#for contour_param = [0, 1] i would need the the 2D array:
result = my_data[:, :, min_pos[2][0], min_pos[3][0]]
#for contour_param = [1, 2] i would need the the 2D array
result = my_data[min_pos[0][0], :, :, min_pos[3][0]]
Great Question...
You can make use of np.s_ for that, as you can build up your slicer with that.
For instance the function:
def build_slicer(contour_param,min_pos):
assert len(contur_param) + 1 == min_pos.shape[0]
output = [] # init a emtpy output list
for main_index in range(min_pos.shape[0]):
if main_index in contour_param:
output.append(np.s_[:])
else:
output.append(np.s_[min_pos[main_index][0]])
return tuple(output)
would return:
import numpy as np
np.random.seed(10)
param_sizes = [2, 3, 4]
my_data = np.random.normal(size=np.prod(param_sizes)).reshape(param_sizes)
min_pos = np.where(my_data == my_data.min())
contour_param = [0,2]
build_slicer(contour_param,min_pos)
>>> (slice(None, None, None), 2, slice(None, None, None))
you can then use this to just slice your array
slice = build_slicer(contour_param,min_pos)
my_data[slice]
I am generating a random matrix with
np.random.randint(2, size=(5, 3))
that outputs something like
[0,1,0],
[1,0,0],
[1,1,1],
[1,0,1],
[0,0,0]
How do I create the random matrix with the condition that each row cannot contain all 1's? That is, each row can be [1,0,0] or [0,0,0] or [1,1,0] or [1,0,1] or [0,0,1] or [0,1,0] or [0,1,1] but cannot be [1,1,1].
Thanks for your answers
Here's an interesting approach:
rows = np.random.randint(7, size=(6, 1), dtype=np.uint8)
np.unpackbits(rows, axis=1)[:, -3:]
Essentially, you are choosing integers 0-6 for each row, ie 000-110 as binary. 7 would be 111 (all 1's). You just need to extract binary digits as columns and take the last 3 digits (your 3 columns) since the output of unpackbits is 8 digits.
Output:
array([[1, 0, 1],
[1, 0, 0],
[1, 0, 0],
[1, 0, 0],
[0, 1, 1],
[0, 0, 0]], dtype=uint8)
If you always have 3 columns, one approach is to explicitly list the possible rows and then choose randomly among them until you have enough rows:
import numpy as np
# every acceptable row
choices = np.array([
[1,0,0],
[0,0,0],
[1,1,0],
[1,0,1],
[0,0,1],
[0,1,0],
[0,1,1]
])
n_rows = 5
# randomly pick which type of row to use for each row needed
idx = np.random.choice(range(len(choices)), size=n_rows)
# make an array by using the chosen rows
array = choices[idx]
If this needs to generalize to a large number of columns, it won't be practical to explicitly list all choices (even if you create the choices programmatically, the memory is still an issue; the number of possible rows grows exponentially in the number of columns). Instead, you can create an initial matrix and then just resample any unacceptable rows until there are none left. I'm assuming that a row is unacceptable if it consists only of 1s; it would be easy to adapt this to the case where the threshold is any number of 1s, though.
n_rows = 5
n_cols = 4
array = np.random.randint(2, size=(n_rows, n_cols))
all_1s_idx = array.sum(axis=-1) == n_cols
while all_1s_idx.any():
array[all_1s_idx] = np.random.randint(2, size=(all_1s_idx.sum(), n_cols))
all_1s_idx = array.sum(axis=-1) == n_cols
Here we just keep resampling all unacceptable rows until there are none left. Because all of the necessary rows are resampled at once, this should be quite efficient. Additionally, as the number of columns grows larger, the probability of a row having all 1s decreases exponentially, so efficiency shouldn't be a problem.
#busybear beat me to it but I'll post it anyway, as it is a bit more general:
def not_all(m, k):
if k>64 or sys.byteorder != 'little':
raise NotImplementedError
sample = np.random.randint(0, 2**k-1, (m,), dtype='u8').view('u1').reshape(m, -1)
sample[:, k//8] <<= -k%8
return np.unpackbits(sample).reshape(m, -1)[:, :k]
For example:
>>> sample = not_all(1000000, 11)
# sanity checks
>>> unq, cnt = np.unique(sample, axis=0, return_counts=True)
>>> len(unq) == 2**11-1
True
>>> unq.sum(1).max()
10
>>> cnt.min(), cnt.max()
(403, 568)
And while I'm at hijacking other people's answers here is a streamlined version of #Nathan's acceptance-rejection method.
def accrej(m, k):
sample = np.random.randint(0, 2, (m, k), bool)
all_ones, = np.where(sample.all(1))
while all_ones.size:
resample = np.random.randint(0, 2, (all_ones.size, k), bool)
sample[all_ones] = resample
all_ones = all_ones[resample.all(1)]
return sample.view('u1')
Try this solution using sum():
import numpy as np
array = np.random.randint(2, size=(5, 3))
for i, entry in enumerate(array):
if entry.sum() == 3:
while True:
new = np.random.randint(2, size=(1, 3))
if new.sum() == 3:
continue
break
array[i] = new
print(array)
Good luck my friend!
I'd like to add two numpy arrays of different shapes, but without broadcasting, rather the "missing" values are treated as zeros. Probably easiest with an example like
[1, 2, 3] + [2] -> [3, 2, 3]
or
[1, 2, 3] + [[2], [1]] -> [[3, 2, 3], [1, 0, 0]]
I do not know the shapes in advance.
I'm messing around with the output of np.shape for each, trying to find the smallest shape which holds both of them, embedding each in a zero-ed array of that shape and then adding them. But it seems rather a lot of work, is there an easier way?
Thanks in advance!
edit: by "a lot of work" I meant "a lot of work for me" rather than for the machine, I seek elegance rather than efficiency: my effort getting the smallest shape holding them both is
def pad(a, b) :
sa, sb = map(np.shape, [a, b])
N = np.max([len(sa),len(sb)])
sap, sbp = map(lambda x : x + (1,)*(N-len(x)), [sa, sb])
sp = np.amax( np.array([ tuple(sap), tuple(sbp) ]), 1)
not pretty :-/
I'm messing around with the output of np.shape for each, trying to find the smallest shape which holds both of them, embedding each in a zero-ed array of that shape and then adding them. But it seems rather a lot of work, is there an easier way?
Getting the np.shape is trivial, finding the smallest shape that holds both is very easy, and of course adding is trivial, so the only "a lot of work" part is the "embedding each in a zero-ed array of that shape".
And yes, you can eliminate that, by just calling the resize method (or the resize function, if you want to make copies instead of changing them in-place). As the docs explain:
Enlarging an array: … missing entries are filled with zeros
For example, if you know the dimensionality statically:
>>> a1 = np.array([[1, 2, 3], [4, 5, 6]])
>>> a2 = np.array([[2], [2]])
>>> shape = [max(a.shape[axis] for a in (a1, a2)) for axis in range(2)]
>>> a1.resize(shape)
>>> a2.resize(shape)
>>> print(a1 + a2)
array([[3, 4, 3],
[4, 5, 6]])
This is the best I could come up with:
import numpy as np
def magic_add(*args):
n = max(a.ndim for a in args)
args = [a.reshape((n - a.ndim)*(1,) + a.shape) for a in args]
shape = np.max([a.shape for a in args], 0)
result = np.zeros(shape)
for a in args:
idx = tuple(slice(i) for i in a.shape)
result[idx] += a
return result
You can clean up the for loop a little if you know how many dimensions you expect on result, something like:
for a in args:
i, j = a.shape
result[:i, :j] += a
You may try my solution - for dimension 1 arrays you have to expand your arrays to
dimension 2 (as shown in the example below), before passing it to the function.
import numpy as np
import timeit
matrix1 = np.array([[0,10],
[1,20],
[2,30]])
matrix2 = np.array([[0,10],
[1,20],
[2,30],
[3,40]])
matrix3 = np.arange(0,0,dtype=int) # empty numpy-array
matrix3.shape = (0,2) # reshape to 0 rows
matrix4 = np.array([[0,10,100,1000],
[1,20,200,2000]])
matrix5 = np.arange(0,4000,1)
matrix5 = np.reshape(matrix5,(4,1000))
matrix6 = np.arange(0.0,4000,0.5)
matrix6 = np.reshape(matrix6,(20,400))
matrix1 = np.array([1,2,3])
matrix1 = np.expand_dims(matrix1, axis=0)
matrix2 = np.array([2,1])
matrix2 = np.expand_dims(matrix2, axis=0)
def add_2d_matrices(m1, m2, pos=(0,0), filler=None):
"""
Add two 2d matrices of different sizes or shapes,
offset by xy coordinates, whereat x is "from left to right" (=axis:1)
and y is "from top to bottom" (=axis:0)
Parameterse:
- m1: first matrix
- m2: second matrix
- pos: tuple (x,y) containing coordinates for m2 offset,
- filler: gaps are filled with the value of filler (or zeros)
Returns:
- 2d array (float):
containing filler-values, m1-values, m2-values
or the sum of m1,m2 (at overlapping areas)
Author:
Reinhard Daemon, Austria
"""
# determine shape of final array:
_m1 = np.copy(m1)
_m2 = np.copy(m2)
x,y = pos
y1,x1 = _m1.shape
y2,x2 = _m2.shape
xmax = max(x1, x2+x)
ymax = max(y1, y2+y)
# fill-up _m1 array with zeros:
y1,x1 = _m1.shape
diff = xmax - x1
_z = np.zeros((y1,diff))
_m1 = np.hstack((_m1,_z))
y1,x1 = _m1.shape
diff = ymax - y1
_z = np.zeros((diff,x1))
_m1 = np.vstack((_m1,_z))
# shift _m2 array by 'pos' and fill-up with zeros:
y2,x2 = _m2.shape
_z = np.zeros((y2,x))
_m2 = np.hstack((_z,_m2))
y2,x2 = _m2.shape
diff = xmax - x2
_z = np.zeros((y2,diff))
_m2 = np.hstack((_m2,_z))
y2,x2 = _m2.shape
_z = np.zeros((y,x2))
_m2 = np.vstack((_z,_m2))
y2,x2 = _m2.shape
diff = ymax - y2
_z = np.zeros((diff,x2))
_m2 = np.vstack((_m2,_z))
# add the 2 arrays:
_m3 = _m1 + _m2
# find and fill the "unused" positions within the summed array:
if filler not in (None,0,0.0):
y1,x1 = m1.shape
y2,x2 = m2.shape
x1min = 0
x1max = x1-1
y1min = 0
y1max = y1-1
x2min = x
x2max = x + x2-1
y2min = y
y2max = y + y2-1
for xx in range(xmax):
for yy in range(ymax):
if x1min <= xx <= x1max and y1min <= yy <= y1max:
continue
if x2min <= xx <= x2max and y2min <= yy <= y2max:
continue
_m3[yy,xx] = filler
return(_m3)
t1 = timeit.Timer("add_2d_matrices(matrix5, matrix6, pos=(1,1), filler=111.111)", \
"from __main__ import add_2d_matrices,matrix5,matrix6")
print("ran:",t1.timeit(number=10), "milliseconds")
print("\n\n")
my_res = add_2d_matrices(matrix1, matrix2, pos=(1,1), filler=99.99)
print(my_res)