I have a numpy_array. Something like [ a b c ].
And then I want to concatenate it with another NumPy array (just like we create a list of lists). How do we create a NumPy array containing NumPy arrays?
I tried to do the following without any luck
>>> M = np.array([])
>>> M
array([], dtype=float64)
>>> M.append(a,axis=0)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'numpy.ndarray' object has no attribute 'append'
>>> a
array([1, 2, 3])
In [1]: import numpy as np
In [2]: a = np.array([[1, 2, 3], [4, 5, 6]])
In [3]: b = np.array([[9, 8, 7], [6, 5, 4]])
In [4]: np.concatenate((a, b))
Out[4]:
array([[1, 2, 3],
[4, 5, 6],
[9, 8, 7],
[6, 5, 4]])
or this:
In [1]: a = np.array([1, 2, 3])
In [2]: b = np.array([4, 5, 6])
In [3]: np.vstack((a, b))
Out[3]:
array([[1, 2, 3],
[4, 5, 6]])
Well, the error message says it all: NumPy arrays do not have an append() method. There's a free function numpy.append() however:
numpy.append(M, a)
This will create a new array instead of mutating M in place. Note that using numpy.append() involves copying both arrays. You will get better performing code if you use fixed-sized NumPy arrays.
You may use numpy.append()...
import numpy
B = numpy.array([3])
A = numpy.array([1, 2, 2])
B = numpy.append( B , A )
print B
> [3 1 2 2]
This will not create two separate arrays but will append two arrays into a single dimensional array.
Sven said it all, just be very cautious because of automatic type adjustments when append is called.
In [2]: import numpy as np
In [3]: a = np.array([1,2,3])
In [4]: b = np.array([1.,2.,3.])
In [5]: c = np.array(['a','b','c'])
In [6]: np.append(a,b)
Out[6]: array([ 1., 2., 3., 1., 2., 3.])
In [7]: a.dtype
Out[7]: dtype('int64')
In [8]: np.append(a,c)
Out[8]:
array(['1', '2', '3', 'a', 'b', 'c'],
dtype='|S1')
As you see based on the contents the dtype went from int64 to float32, and then to S1
I found this link while looking for something slightly different, how to start appending array objects to an empty numpy array, but tried all the solutions on this page to no avail.
Then I found this question and answer: How to add a new row to an empty numpy array
The gist here:
The way to "start" the array that you want is:
arr = np.empty((0,3), int)
Then you can use concatenate to add rows like so:
arr = np.concatenate( ( arr, [[x, y, z]] ) , axis=0)
See also https://docs.scipy.org/doc/numpy/reference/generated/numpy.concatenate.html
Actually one can always create an ordinary list of numpy arrays and convert it later.
In [1]: import numpy as np
In [2]: a = np.array([[1,2],[3,4]])
In [3]: b = np.array([[1,2],[3,4]])
In [4]: l = [a]
In [5]: l.append(b)
In [6]: l = np.array(l)
In [7]: l.shape
Out[7]: (2, 2, 2)
In [8]: l
Out[8]:
array([[[1, 2],
[3, 4]],
[[1, 2],
[3, 4]]])
I had the same issue, and I couldn't comment on #Sven Marnach answer (not enough rep, gosh I remember when Stackoverflow first started...) anyway.
Adding a list of random numbers to a 10 X 10 matrix.
myNpArray = np.zeros([1, 10])
for x in range(1,11,1):
randomList = [list(np.random.randint(99, size=10))]
myNpArray = np.vstack((myNpArray, randomList))
myNpArray = myNpArray[1:]
Using np.zeros() an array is created with 1 x 10 zeros.
array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
Then a list of 10 random numbers is created using np.random and assigned to randomList.
The loop stacks it 10 high. We just have to remember to remove the first empty entry.
myNpArray
array([[31., 10., 19., 78., 95., 58., 3., 47., 30., 56.],
[51., 97., 5., 80., 28., 76., 92., 50., 22., 93.],
[64., 79., 7., 12., 68., 13., 59., 96., 32., 34.],
[44., 22., 46., 56., 73., 42., 62., 4., 62., 83.],
[91., 28., 54., 69., 60., 95., 5., 13., 60., 88.],
[71., 90., 76., 53., 13., 53., 31., 3., 96., 57.],
[33., 87., 81., 7., 53., 46., 5., 8., 20., 71.],
[46., 71., 14., 66., 68., 65., 68., 32., 9., 30.],
[ 1., 35., 96., 92., 72., 52., 88., 86., 94., 88.],
[13., 36., 43., 45., 90., 17., 38., 1., 41., 33.]])
So in a function:
def array_matrix(random_range, array_size):
myNpArray = np.zeros([1, array_size])
for x in range(1, array_size + 1, 1):
randomList = [list(np.random.randint(random_range, size=array_size))]
myNpArray = np.vstack((myNpArray, randomList))
return myNpArray[1:]
a 7 x 7 array using random numbers 0 - 1000
array_matrix(1000, 7)
array([[621., 377., 931., 180., 964., 885., 723.],
[298., 382., 148., 952., 430., 333., 956.],
[398., 596., 732., 422., 656., 348., 470.],
[735., 251., 314., 182., 966., 261., 523.],
[373., 616., 389., 90., 884., 957., 826.],
[587., 963., 66., 154., 111., 529., 945.],
[950., 413., 539., 860., 634., 195., 915.]])
Try this code :
import numpy as np
a1 = np.array([])
n = int(input(""))
for i in range(0,n):
a = int(input(""))
a1 = np.append(a, a1)
a = 0
print(a1)
Also you can use array instead of "a"
If I understand your question, here's one way. Say you have:
a = [4.1, 6.21, 1.0]
so here's some code...
def array_in_array(scalarlist):
return [(x,) for x in scalarlist]
Which leads to:
In [72]: a = [4.1, 6.21, 1.0]
In [73]: a
Out[73]: [4.1, 6.21, 1.0]
In [74]: def array_in_array(scalarlist):
....: return [(x,) for x in scalarlist]
....:
In [75]: b = array_in_array(a)
In [76]: b
Out[76]: [(4.1,), (6.21,), (1.0,)]
As you want to concatenate along an existing axis (row wise), np.vstack or np.concatenate will work for you.
For a detailed list of concatenation operations, refer to the official docs.
This is for people working with numpy's ndarrays. The function numpy.concatenate() does work as well.
>>a = np.random.randint(0,9, size=(10,1,5,4))
>>a.shape
(10, 1, 5, 4)
>>b = np.random.randint(0,9, size=(15,1,5,4))
>>b.shape
(15, 1, 5, 4)
>>X = np.concatenate((a, b))
>>X.shape
(25, 1, 5, 4)
Much the same way as vstack()
>>Y = np.vstack((a,b))
>>Y.shape
(25, 1, 5, 4)
There is a handfull of method to stack arrays together, depending on the direction of the stack.
You may consider np.stack() (doc), np.vstack() (doc) and np.hstack() (doc) for example.
Related
how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.
This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]
You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]
I currently have code that allows one to take a combinatorial (cartesian) product across a particular axis. This is in numpy, and originated from a previous question Efficient axis-wise cartesian product of multiple 2D matrices with Numpy or TensorFlow
A = np.array([[1,2],
[3,4]])
B = np.array([[10,20],
[5,6]])
C = np.array([[50, 0],
[60, 8]])
cartesian_product( [A,B,C], axis=1 )
>> np.array([[ 1*10*50, 1*10*0, 1*20*50, 1*20*0, 2*10*50, 2*10*0, 2*20*50, 2*20*0]
[ 3*5*60, 3*5*8, 3*6*60, 3*6*8, 4*5*60, 4*5*8, 4*6*60, 4*6*8]])
and to reiterate the solution:
L = [A,B,C] # list of arrays
n = L[0].shape[0]
out = (L[1][:,None]*L[0][:,:,None]).reshape(n,-1)
for i in L[2:]:
out = (i[:,None]*out[:,:,None]).reshape(n,-1)
Is there an existing method to perform this with broadcasting in tensorflow - without a for loop?
Ok so I managed to find a pure tf based (partial) answer for two arrays. It's not currently generalizable like the numpy solution for M arrays, but that's for another question (perhaps a tf.while_loop). For those that are curious, the solution adapts from Evaluate all pair combinations of rows of two tensors in tensorflow
a = np.array([[0, 1, 2, 3],
[4, 5, 6, 7],
[4, 5, 6, 7]])
b = np.array([[0, 1],
[2, 3],
[2, 3]])
N = a.shape[0]
A = tf.constant(a, dtype=tf.float64)
B = tf.constant(b, dtype=tf.float64)
A_ = tf.expand_dims(A, axis=1)
B_ = tf.expand_dims(B, axis=2)
z = tf.reshape(tf.multiply(A_, B_), [N, -1])
>> tf_result
Out[1]:
array([[ 0., 0., 0., 0., 0., 1., 2., 3.],
[ 8., 10., 12., 14., 12., 15., 18., 21.],
[ 8., 10., 12., 14., 12., 15., 18., 21.]])
Solutions for the multiple array case are welcome
Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.
I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...
Given an ndarray x and a one dimensional array containing the length of contiguous slices of a dimension of x, I want to compute a new array that contains the sum of all of the slices. For example, in two dimensions summing over dimension one:
>>> lens = np.array([1, 3, 2])
array([1, 3, 2])
>>> x = np.arange(4 * lens.sum()).reshape((4, lens.sum())).astype(float)
array([[ 0., 1., 2., 3., 4., 5.],
[ 6., 7., 8., 9., 10., 11.],
[ 12., 13., 14., 15., 16., 17.],
[ 18., 19., 20., 21., 22., 23.]])
# I want to compute:
>>> result
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
# 0 = 0
# 6 = 1 + 2 + 3
# ...
# 45 = 22 + 23
The two ways that come to mind are:
a) Use cumsum and fancy indexing:
def cumsum_method(x, lens):
xc = x.cumsum(1)
lc = lens.cumsum() - 1
res = xc[:, lc]
res[:, 1:] -= xc[:, lc[:-1]]
return res
b) Use bincount and intelligently generate the appropriate bins:
def bincount_method(x, lens):
bins = np.arange(lens.size).repeat(lens) + \
np.arange(x.shape[0])[:, None] * lens.size
return np.bincount(bins.flat, weights=x.flat).reshape((-1, lens.size))
Timing these two on large input had the cumsum method performing slightly better:
>>> lens = np.random.randint(1, 100, 100)
>>> x = np.random.random((100000, lens.sum()))
>>> %timeit cumsum_method(x, lens)
1 loops, best of 3: 3 s per loop
>>> %timeit bincount_method(x, lens)
1 loops, best of 3: 3.9 s per loop
Is there an obviously more efficient way that I'm missing? It seems like a native c call would be faster because it wouldn't require allocating the cumsum or the bins array. A numpy builtin function that does something close to this could likely be better than (a) or (b). I couldn't find anything through searching and looking through the documentation.
Note, this is similar to this question, but the summation intervals aren't regular.
You can use np.add.reduceat:
>>> np.add.reduceat(x, [0, 1, 4], axis=1)
array([[ 0., 6., 9.],
[ 6., 24., 21.],
[ 12., 42., 33.],
[ 18., 60., 45.]])
The list of indices [0, 1, 4] means: "sum the slices 0:1, 1:4 and 4:". You could generate these values from lens using np.hstack(([0], lens[:-1])).cumsum().
Even factoring in the calculation of the indices from lens, a reduceat method is likely to be significantly faster than alternative methods:
def reduceat_method(x, lens):
i = np.hstack(([0], lens[:-1])).cumsum()
return np.add.reduceat(x, i, axis=1)
lens = np.random.randint(1, 100, 100)
x = np.random.random((1000, lens.sum())
%timeit reduceat_method(x, lens)
# 100 loops, best of 3: 4.89 ms per loop
%timeit cumsum_method(x, lens)
# 10 loops, best of 3: 35.8 ms per loop
%timeit bincount_method(x, lens)
# 10 loops, best of 3: 43.6 ms per loop
Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...
The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.
You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])