Related
I want to create a NumPy array of np.ndarray from an iterable. This is because I have a function that will return np.ndarray of some constant shape, and I need to create an array of results from this function, something like this:
OUTPUT_SHAPE = some_constant
def foo(input) -> np.ndarray:
# processing
# generated np.ndarray of shape OUTPUT_SHAPE
return output
inputs = [i for i in range(100000)]
iterable = (foo(input) for input in inputs)
arr = np.fromiter(iterable, np.ndarray)
This obviously gives an error:-
cannot create object arrays from iterator
I cannot first create a list then convert it to an array, because it will first create a copy of every output array, so for a time, there will be almost double memory occupied, and I have very limited memory.
Can anyone help me?
You probably shouldn't make an object array. You should probably make an ordinary 2D array of non-object dtype. As long as you know the number of results the iterator will give in advance, you can avoid most of the copying you're worried about by doing it like this:
arr = numpy.empty((num_iterator_outputs, OUTPUT_SHAPE), dtype=whatever_appropriate_dtype)
for i, output in enumerate(iterable):
arr[i] = output
This only needs to hold arr and a single output in memory at once, instead of arr and every output.
If you really want an object array, you can get one. The simplest way would be to go through a list, which will not perform the copying you're worried about as long as you do it right:
outputs = list(iterable)
arr = numpy.empty(len(outputs), dtype=object)
arr[:] = outputs
Note that if you just try to call numpy.array on outputs, it will try to build a 2D array, which will cause the copying you're worried about. This is true even if you specify dtype=object - it'll try to build a 2D array of object dtype, and that'll be even worse, for both usability and memory.
An object dtype array contains references, just like a list.
Define 3 arrays:
In [589]: a,b,c = np.arange(3), np.ones(3), np.zeros(3)
put them in a list:
In [590]: alist = [a,b,c]
and in an object dtype array:
In [591]: arr = np.empty(3,object)
In [592]: arr[:] = alist
In [593]: arr
Out[593]:
array([array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])],
dtype=object)
In [594]: alist
Out[594]: [array([0, 1, 2]), array([1., 1., 1.]), array([0., 0., 0.])]
Modify one, and see the change in the list and array:
In [595]: b[:] = [1,2,3]
In [596]: b
Out[596]: array([1., 2., 3.])
In [597]: alist
Out[597]: [array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])]
In [598]: arr
Out[598]:
array([array([0, 1, 2]), array([1., 2., 3.]), array([0., 0., 0.])],
dtype=object)
A numeric dtype array created from these copies all values:
In [599]: arr1 = np.stack(arr)
In [600]: arr1
Out[600]:
array([[0., 1., 2.],
[1., 2., 3.],
[0., 0., 0.]])
So even if your use of fromiter worked, it wouldn't be any different, memory wise from a list accumulation:
alist = []
for i in range(n):
alist.append(constant_array)
I am trying to use broadcasting to speed up my numpy code. the real code has much larger arrays and loops through multiple times, but I think this snippet illustrates the issue.
import numpy as np
row = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])
change1 = np.zeros(5)
change2 = np.zeros(5)
for k in range(0, row.shape[0]):
i = row[k]
change1[i] += dl_ddk[k]
change2[row] += dl_ddk
print(change1)
print(change2)
change1 = [8, 141, 0, 0 11]
change2 = [8, 112, 0, 0 11]
I thought these two change arrays would be equals however, it seems that the broadcast operations += is overwriting rather than adding values. Is there a way to vectorize a loop in np with matrix referencing like this that will give the same results as change1?
You can use np.bincount() and use dl_ddk as the weights:
import numpy as np
row = np.array([0,0,1,1,4])
dl_ddk = np.array([0,8,29,112,11])
change1 = np.bincount(row, weights=dl_ddk)
print(change1)
# [ 8. 141. 0. 0. 11.]
The bit in the docs show using it in a way almost exactly like your problem:
If weights is specified the input array is weighted by it, i.e. if a
value n is found at position i, out[n] += weight[i] instead of out[n]
+= 1.
In [1]: row = np.array([0,0,1,1,4])
...: dl_ddk = np.array([0,8,29,112,11])
...: change1 = np.zeros(5)
...: change2 = np.zeros(5)
...: for k in range(0, row.shape[0]):
...: i = row[k]
...: change1[i] += dl_ddk[k]
...: change2[row] += dl_ddk
change2 does not match because of buffering. ufunc has added a at method to address this:
Performs unbuffered in place operation on operand 'a' for elements specified by 'indices'.
In [3]: change3 = np.zeros(5)
In [4]: np.add.at(change3, row, dl_ddk)
In [5]: change1
Out[5]: array([ 8., 141., 0., 0., 11.])
In [6]: change2
Out[6]: array([ 8., 112., 0., 0., 11.])
In [7]: change3
Out[7]: array([ 8., 141., 0., 0., 11.])
I am currently having the problem parsing a string to a numpy array.
The string look like this:
input = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
The string represents a sparse vector, where the vector itself is delimited by curly brackets. Each entry, itself delimited by curly brackets, indicates which indices have which entries. The first entry in the list encodes the dimensions of the vector.
In the above example, the vector has length of 13 and 4 entries which are different from 0.
output = np.array([0,7,1,0,4,0,0,0,0,1,0,0,0])
After parsing it to an array, I have to parse to back to a string in its dense format, with the format:
stringoutput = '{0,7,1,0,4,0,0,0,0,1,0,0,0}'
While I managed to parse the numpy array to a string, I ran into the problem of having the wrong brackets (i.e. the build in array2string function uses [], while I need {})
I am open for any suggestions that help, solving this efficiently (even for large sparse vectors).
Thank you.
\edit: The given vector is always one dimensional, i.e. the second number within the first {} will always be 1. (and you only need 1 index to locate the position of elements)
Here is a numpythonic way:
In [132]: inp = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
# Relace the brackets with parenthesis in order to convert the string to a valid python object.
In [133]: inp = ast.literal_eval(inp.replace('{', '(').replace('}', ')'))
# Unpack the dimention and rest of then values from input object
In [134]: dim, *rest = inp
# Creat the zero array based on extracted dimention
In [135]: arr = np.zeros(dim)
# use `zip` to collecte teh indices and values separately in order to be use in `np.put`
In [136]: indices, values = zip(*rest)
In [137]: np.put(arr, indices, values)
In [138]: arr
Out[138]:
array([[ 0.],
[ 7.],
[ 1.],
[ 0.],
[ 4.],
[ 0.],
[ 0.],
[ 0.],
[ 0.],
[ 1.],
[ 0.],
[ 0.],
[ 0.]])
I like #Kasramvd's approach, but figured I'd put this one out there as well:
In [116]: r = (list(map(int, a.split(','))) for a in input[2:-2].split('},{'))
In [118]: l = np.zeros(next(r)[0], np.int)
In [119]: for a in r:
...: l[a[0]] = a[1]
...:
In [122]: s = '{' + ','.join(map(str, l)) + '}'
In [123]: s
Out[123]: '{0,7,1,0,4,0,0,0,0,1,0,0,0}'
This is based on #Kasramvd's answer. I adjusted how the other values are populated.
from #Kasramvd
import numpy as np
import ast
inp = '{{13,1},{2,1},{4,4},{1,7},{9,1}}'
inp = ast.literal_eval(inp.replace('{', '(').replace('}', ')'))
dim, *rest = inp
my adjustments
a = np.zeros(dim, dtype=int)
r = np.array(rest)
a[r[:, 0], 0] = r[:, 1]
a
array([[0],
[7],
[1],
[0],
[4],
[0],
[0],
[0],
[0],
[1],
[0],
[0],
[0]])
in one dimension
a = np.zeros(dim[0], dtype=int)
r = np.array(rest)
a[r[:, 0]] = r[:, 1]
a
array([0, 7, 1, 0, 4, 0, 0, 0, 0, 1, 0, 0, 0])
Here's the problem. Let's say I have a matrix A =
array([[ 1., 0., 2.],
[ 0., 0., 2.],
[ 0., -1., 3.]])
and a vector of indices p = array([0, 2, 1]). I want to turn a 3x3 matrix A to an array of length 3 (call it v) where v[j] = A[j, p[j]] for j = 0, 1, 2. I can do it the following way:
v = map(lambda (row, idx): row[idx], zip(A, p))
So for the above matrix A and a vector of indices p I expect to get array([1, 2, -1]) (ie 0th element of row 0, 2nd element of row 1, 1st element of row 2).
But can I achieve the same result by using native numpy (ie without explicitly zipping and then mapping)? Thanks.
I don't think that such a functionality exists. To achieve what you want, I can think of two easy ways. You could do:
np.diag(A[:, p])
Here the array p is applied as a column index for every row such that on the diagonal you will have the elements that you are looking for.
As an alternative you can avoid to produce a lot of unnecessary entries by using:
A[np.arange(A.shape[0]), p]
I'm trying to move a few Matlab libraries that I've built to the python environment. So far, the biggest issue I faced is the dynamic allocation of arrays based on index specification. For example, using Matlab, typing the following:
x = [1 2];
x(5) = 3;
would result in:
x = [ 1 2 0 0 3]
In other words, I didn't know before hand the size of (x), nor its content. The array must be defined on the fly, based on the indices that I'm providing.
In python, trying the following:
from numpy import *
x = array([1,2])
x[4] = 3
Would result in the following error: IndexError: index out of bounds. On workaround is incrementing the array in a loop and then assigned the desired value as :
from numpy import *
x = array([1,2])
idx = 4
for i in range(size(x),idx+1):
x = append(x,0)
x[idx] = 3
print x
It works, but it's not very convenient and it might become very cumbersome for n-dimensional arrays.I though about subclassing ndarray to achieve my goal, but I'm not sure if it would work. Does anybody knows of a better approach?
Thanks for the quick reply. I didn't know about the setitem method (I'm fairly new to Python). I simply overwritten the ndarray class as follows:
import numpy as np
class marray(np.ndarray):
def __setitem__(self, key, value):
# Array properties
nDim = np.ndim(self)
dims = list(np.shape(self))
# Requested Index
if type(key)==int: key=key,
nDim_rq = len(key)
dims_rq = list(key)
for i in range(nDim_rq): dims_rq[i]+=1
# Provided indices match current array number of dimensions
if nDim_rq==nDim:
# Define new dimensions
newdims = []
for iDim in range(nDim):
v = max([dims[iDim],dims_rq[iDim]])
newdims.append(v)
# Resize if necessary
if newdims != dims:
self.resize(newdims,refcheck=False)
return super(marray, self).__setitem__(key, value)
And it works like a charm! However, I need to modify the above code such that the setitem allow changing the number of dimensions following this request:
a = marray([0,0])
a[3,1,0] = 0
Unfortunately, when I try to use numpy functions such as
self = np.expand_dims(self,2)
the returned type is numpy.ndarray instead of main.marray. Any idea on how I could enforce that numpy functions output marray if a marray is provided as an input? I think it should be doable using array_wrap, but I could never find exactly how. Any help would be appreciated.
Took the liberty of updating my old answer from Dynamic list that automatically expands. Think this should do most of what you need/want
class matlab_list(list):
def __init__(self):
def zero():
while 1:
yield 0
self._num_gen = zero()
def __setitem__(self,index,value):
if isinstance(index, int):
self.expandfor(index)
return super(dynamic_list,self).__setitem__(index,value)
elif isinstance(index, slice):
if index.stop<index.start:
return super(dynamic_list,self).__setitem__(index,value)
else:
self.expandfor(index.stop if abs(index.stop)>abs(index.start) else index.start)
return super(dynamic_list,self).__setitem__(index,value)
def expandfor(self,index):
rng = []
if abs(index)>len(self)-1:
if index<0:
rng = xrange(abs(index)-len(self))
for i in rng:
self.insert(0,self_num_gen.next())
else:
rng = xrange(abs(index)-len(self)+1)
for i in rng:
self.append(self._num_gen.next())
# Usage
spec_list = matlab_list()
spec_list[5] = 14
This isn't quite what you want, but...
x = np.array([1, 2])
try:
x[index] = value
except IndexError:
oldsize = len(x) # will be trickier for multidimensional arrays; you'll need to use x.shape or something and take advantage of numpy's advanced slicing ability
x = np.resize(x, index+1) # Python uses C-style 0-based indices
x[oldsize:index] = 0 # You could also do x[oldsize:] = 0, but that would mean you'd be assigning to the final position twice.
x[index] = value
>>> x = np.array([1, 2])
>>> x = np.resize(x, 5)
>>> x[2:5] = 0
>>> x[4] = 3
>>> x
array([1, 2, 0, 0, 3])
Due to how numpy stores the data linearly under the hood (though whether it stores as row-major or column-major can be specified when creating arrays), multidimensional arrays are pretty tricky here.
>>> x = np.array([[1, 2, 3], [4, 5, 6]])
>>> np.resize(x, (6, 4))
array([[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6],
[1, 2, 3, 4],
[5, 6, 1, 2],
[3, 4, 5, 6]])
You'd need to do this or something similar:
>>> y = np.zeros((6, 4))
>>> y[:x.shape[0], :x.shape[1]] = x
>>> y
array([[ 1., 2., 3., 0.],
[ 4., 5., 6., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.],
[ 0., 0., 0., 0.]])
A python dict will work well as a sparse array. The main issue is the syntax for initializing the sparse array will not be as pretty:
listarray = [100,200,300]
dictarray = {0:100, 1:200, 2:300}
but after that the syntax for inserting or retrieving elements is the same
dictarray[5] = 2345