Adding n columns to a numpy array [duplicate] - python

This question already has answers here:
Concatenate a NumPy array to another NumPy array
(12 answers)
Closed 7 years ago.
I'm making a program where I need to make a matrix looking like this:
A = np.array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])
So I started thinking about this np.arange(1,4)
But, how to append n columns of np.arange(1,4) to A?

As mentioned in docs you can use concatenate
>>> a = np.array([[1, 2], [3, 4]])
>>> b = np.array([[5, 6]])
>>> np.concatenate((a, b), axis=0)
array([[1, 2],
[3, 4],
[5, 6]])
>>> np.concatenate((a, b.T), axis=1)
array([[1, 2, 5],
[3, 4, 6]])

Here's another way, using broadcasting:
In [69]: np.arange(1,4)*np.ones((4,1))
Out[69]:
array([[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.],
[ 1., 2., 3.]])

You can get something like what you typed in your question with:
N = 3
A = np.tile(np.arange(1, N+1), (N, 1))
I'm assuming you want a square array?

>>> np.repeat([np.arange(1, 4)], 4, 0)
array([[1, 2, 3],
[1, 2, 3],
[1, 2, 3],
[1, 2, 3]])

Related

find infinity values and replace with maximum per vector in a numpy array

Suppose I have the following array with shape (3, 5) :
array = np.array([[1, 2, 3, inf, 5],
[10, 9, 8, 7, 6],
[4, inf, 2, 6, inf]])
Now I want to find the infinity values per vector and replace them with the maximum of that vector, with a lower limit of 1.
So the output for this example shoud be:
array_solved = np.array([[1, 2, 3, 5, 5],
[10, 9, 8, 7, 6],
[4, 6, 2, 6, 6]])
I could do this by looping over every vector of the array and apply:
idx_inf = np.isinf(array_vector)
max_value = np.max(np.append(array_vector[~idx_inf], 1.0))
array_vector[idx_inf] = max_value
But I guess there is a faster way.
Anyone an idea?
One way is to first convert infs to NaNs with np.isinf masking and then NaNs to max values of rows with np.nanmax:
array[np.isinf(array)] = np.nan
array[np.isnan(array)] = np.nanmax(array, axis=1)
to get
>>> array
array([[ 1., 2., 3., 5., 5.],
[10., 9., 8., 7., 6.],
[ 4., 10., 2., 6., 6.]])
import numpy as np
array = np.array([[1, 2, 3, np.inf, 5],
[10, 9, 8, 7, 6],
[4, np.inf, 2, 6, np.inf]])
n, m = array.shape
array[np.isinf(array)] = -np.inf
mx_array = np.repeat(np.max(array, axis=1), m).reshape(n, m)
ind = np.where(np.isinf(array))
array[ind] = mx_array[ind]
Output array:
array([[ 1., 2., 3., 5., 5.],
[10., 9., 8., 7., 6.],
[ 4., 6., 2., 6., 6.]])

How to stack uneven numpy arrays?

how can I stack the elements from the same respective index from each array in a list of arrays?
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
output = [[1,6,11,2],
[2,7,22,4],
[3,8,33],
[4,9,44],
[5,55]]
arrays is a list of arrays of uneven lengths. The output has a first array (don't mind if it's a list too) that contains all possible index 0s from each array. The next array within output contains all possible index 1s and so on...
Closest thing I can find (but requires same shape arrays) is:
a = np.array([1, 2, 3])
b = np.array([2, 3, 4])
np.stack((a, b), axis=-1)
# which gives
array([[1, 2],
[2, 3],
[3, 4]])
Thanks.
This gets you close. You can't really have a 2D sparse array as shown in your example output.
import numpy as np
arrays = [np.array([1,2,3,4,5]),
np.array([6,7,8,9]),
np.array([11,22,33,44,55]),
np.array([2,4])]
maxx = max(x.shape[0] for x in arrays)
for x in arrays:
x.resize(maxx,refcheck=False)
output = np.stack(arrays, axis=1)
print(output)
C:\tmp>python x.py
[[ 1 6 11 2]
[ 2 7 22 4]
[ 3 8 33 0]
[ 4 9 44 0]
[ 5 0 55 0]]
You could just wrap it in a DataFrame first:
arr = pd.DataFrame(arrays).values.T
Output:
array([[ 1., 6., 11., 2.],
[ 2., 7., 22., 4.],
[ 3., 8., 33., nan],
[ 4., 9., 44., nan],
[ 5., nan, 55., nan]])
Though if you really want it with different sizes, go with:
arr = [x.dropna().values for _, x in pd.DataFrame(arrays).iteritems()]
Output:
[array([ 1, 6, 11, 2]),
array([ 2, 7, 22, 4]),
array([ 3., 8., 33.]),
array([ 4., 9., 44.]),
array([ 5., 55.])]

Create a matrix from a vector where each row is a shifted version of the vector

I have a numpy array like this
import numpy as np
ar = np.array([1, 2, 3, 4])
and I want to create an array that looks like this:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Thereby, each row corresponds to ar which is shifted by the row index + 1.
A straightforward implementation could look like this:
ar_roll = np.tile(ar, ar.shape[0]).reshape(ar.shape[0], ar.shape[0])
for indi, ri in enumerate(ar_roll):
ar_roll[indi, :] = np.roll(ri, indi + 1)
which gives me the desired output.
My question is whether there is a smarter way of doing this which avoids the loop.
Here's one approach using NumPy strides basically padding with the leftover elements and then the strides helping us in creating that shifted version pretty efficiently -
def strided_method(ar):
a = np.concatenate(( ar, ar[:-1] ))
L = len(ar)
n = a.strides[0]
return np.lib.stride_tricks.as_strided(a[L-1:], (L,L), (-n,n))
Sample runs -
In [42]: ar = np.array([1, 2, 3, 4])
In [43]: strided_method(ar)
Out[43]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
In [44]: ar = np.array([4,9,3,6,1,2])
In [45]: strided_method(ar)
Out[45]:
array([[2, 4, 9, 3, 6, 1],
[1, 2, 4, 9, 3, 6],
[6, 1, 2, 4, 9, 3],
[3, 6, 1, 2, 4, 9],
[9, 3, 6, 1, 2, 4],
[4, 9, 3, 6, 1, 2]])
Runtime test -
In [5]: a = np.random.randint(0,9,(1000))
# #Eric's soln
In [6]: %timeit roll_matrix(a)
100 loops, best of 3: 3.39 ms per loop
# #Warren Weckesser's soln
In [8]: %timeit circulant(a[::-1])
100 loops, best of 3: 2.03 ms per loop
# Strides method
In [18]: %timeit strided_method(a)
100000 loops, best of 3: 6.7 µs per loop
Making a copy (if you want to make changes and not just use as a read only array) won't hurt us too badly for the strides method -
In [19]: %timeit strided_method(a).copy()
1000 loops, best of 3: 381 µs per loop
Both of the existing answers are fine; this answer is probably only of interest if you are already using scipy.
The matrix that you describe is known as a circulant matrix. If you don't mind the dependency on scipy, you can use scipy.linalg.circulant to create one:
In [136]: from scipy.linalg import circulant
In [137]: ar = np.array([1, 2, 3, 4])
In [138]: circulant(ar[::-1])
Out[138]:
array([[4, 1, 2, 3],
[3, 4, 1, 2],
[2, 3, 4, 1],
[1, 2, 3, 4]])
Here's one approach
def roll_matrix(vec):
N = len(vec)
buffer = np.empty((N, N*2 - 1))
# generate a wider array that we want a slice into
buffer[:,:N] = vec
buffer[:,N:] = vec[:-1]
rolled = buffer.reshape(-1)[N-1:-1].reshape(N, -1)
return rolled[:,:N]
In your case, we build buffer to be
array([[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2., 3.]])
Then flatten it, trim it, reshape it to get rolled:
array([[ 4., 1., 2., 3., 1., 2.],
[ 3., 4., 1., 2., 3., 1.],
[ 2., 3., 4., 1., 2., 3.],
[ 1., 2., 3., 4., 1., 2.]])
And finally, slice off the garbage last columns

Python: properly iterating through a dictionary of numpy arrays

Given the following numpy arrays:
import numpy
a=numpy.array([[1,1,1],[1,1,1],[1,1,1]])
b=numpy.array([[2,2,2],[2,2,2],[2,2,2]])
c=numpy.array([[3,3,3],[3,3,3],[3,3,3]])
and this dictionary containing them all:
mydict={0:a,1:b,2:c}
What is the most efficient way of iterating through mydict so to compute the average numpy array that has (1+2+3)/3=2 as values?
My attempt fails as I am giving it too many values to unpack. It is also extremely inefficient as it has an O(n^3) time complexity:
aver=numpy.empty([a.shape[0],a.shape[1]])
for c,v in mydict.values():
for i in range(0,a.shape[0]):
for j in range(0,a.shape[1]):
aver[i][j]=mydict[c][i][j] #<-too many values to unpack
The final result should be:
In[17]: aver
Out[17]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
EDIT
I am not looking for an average value for each numpy array. I am looking for an average value for each element of my colleciton of numpy arrays. This is a minimal example, but the real thing I am working on has over 120,000 elements per array, and for the same position the values change from array to array.
I think you're making this harder than it needs to be. Either sum them and divide by the number of terms:
In [42]: v = mydict.values()
In [43]: sum(v) / len(v)
Out[43]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
Or stack them into one big array -- which it sounds like is the format they probably should have been in to start with -- and take the mean over the stacked axis:
In [44]: np.array(list(v)).mean(axis=0)
Out[44]:
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
You really shouldn't be using a dict of numpy.arrays. Just use a multi-dimensional array:
>>> bigarray = numpy.array([arr.tolist() for arr in mydict.values()])
>>> bigarray
array([[[1, 1, 1],
[1, 1, 1],
[1, 1, 1]],
[[2, 2, 2],
[2, 2, 2],
[2, 2, 2]],
[[3, 3, 3],
[3, 3, 3],
[3, 3, 3]]])
>>> bigarray.mean(axis=0)
array([[ 2., 2., 2.],
[ 2., 2., 2.],
[ 2., 2., 2.]])
>>>
You should modify your code to not even work with a dict. Especially not a dict with integer keys...

Map arrays with duplicate indexes?

Assume three arrays in numpy:
a = np.zeros(5)
b = np.array([3,3,3,0,0])
c = np.array([1,5,10,50,100])
b can now be used as an index for a and c. For example:
In [142]: c[b]
Out[142]: array([50, 50, 50, 1, 1])
Is there any way to add up the values connected to the duplicate indexes with this kind of slicing? With
a[b] = c
Only the last values are stored:
array([ 100., 0., 0., 10., 0.])
I would like something like this:
a[b] += c
which would give
array([ 150., 0., 0., 16., 0.])
I'm mapping very large vectors onto 2D matrices and would really like to avoid loops...
The += operator for NumPy arrays simply doesn't work the way you are hoping, and I'm not aware of a away of making it work that way. As a work-around I suggest using numpy.bincount():
>>> numpy.bincount(b, c)
array([ 150., 0., 0., 16.])
Just append zeros as needed.
You could do something like:
def sum_unique(label, weight):
order = np.lexsort(label.T)
label = label[order]
weight = weight[order]
unique = np.ones(len(label), 'bool')
unique[:-1] = (label[1:] != label[:-1]).any(-1)
totals = weight.cumsum()
totals = totals[unique]
totals[1:] = totals[1:] - totals[:-1]
return label[unique], totals
And use it like this:
In [110]: coord = np.random.randint(0, 3, (10, 2))
In [111]: coord
Out[111]:
array([[0, 2],
[0, 2],
[2, 1],
[1, 2],
[1, 0],
[0, 2],
[0, 0],
[2, 1],
[1, 2],
[1, 2]])
In [112]: weights = np.ones(10)
In [113]: uniq_coord, sums = sum_unique(coord, weights)
In [114]: uniq_coord
Out[114]:
array([[0, 0],
[1, 0],
[2, 1],
[0, 2],
[1, 2]])
In [115]: sums
Out[115]: array([ 1., 1., 2., 3., 3.])
In [116]: a = np.zeros((3,3))
In [117]: x, y = uniq_coord.T
In [118]: a[x, y] = sums
In [119]: a
Out[119]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])
I just thought of this, it might be easier:
In [120]: flat_coord = np.ravel_multi_index(coord.T, (3,3))
In [121]: sums = np.bincount(flat_coord, weights)
In [122]: a = np.zeros((3,3))
In [123]: a.flat[:len(sums)] = sums
In [124]: a
Out[124]:
array([[ 1., 0., 3.],
[ 1., 0., 3.],
[ 0., 2., 0.]])

Categories