Repeating numpy values and specifying dtype - python

I want to generate a numpy array of the form:
0.5*[[0, 0], [1, 1], [2, 2], ...]
I want the final array to have a dtype of numpy.float32.
Here is my attempt:
>>> import numpy as np
>>> N = 5
>>> x = np.array(np.repeat(0.5*np.arange(N), 2), np.float32)
>>> x
array([ 0. , 0. , 0.5, 0.5, 1. , 1. , 1.5, 1.5, 2. , 2. ], dtype=float32)
Is this a good way? Can I avoid the copy (if it is indeed copying) just for type conversion?

You only has to reshape your final result to obtain what you want:
x = x.reshape(-1, 2)
You could also run arange passing the dtype:
x = np.repeat(0.5*np.arange(N, dtype=np.float32), 2).reshape(-1, 2)
You can easily cast the array as another type using the astype method, which accepts an argument copy:
x.astype(np.int8, copy=False)
But, as explained in the documentation, numpy checks for some requirements in order to return the view. If those requirements are not satisfied, a copy is returned.
You can check if a given array is a copy or a view from another by checking the OWNDATA attribute, accessible through the flags property of the ndarray.
EDIT: more on checking if a given array is a copy...
Is there a way to check if numpy arrays share the same data?

An alternative:
np.array([0.5*np.arange(N, dtype=np.float32)]*2)
Gives:
array([[ 0. , 0.5, 1. , 1.5, 2. ],
[ 0. , 0.5, 1. , 1.5, 2. ]], dtype=float32)
You might want to rotate it:
np.rot90(np.array([0.5*np.arange(N, dtype=np.float32)]*2),3)
Giving:
array([[ 0. , 0. ],
[ 0.5, 0.5],
[ 1. , 1. ],
[ 1.5, 1.5],
[ 2. , 2. ]], dtype=float32)
Note, this is slower than #Saullo_Castro's answer:
np.rot90(np.array([0.5*np.arange(N, dtype=np.float32)]*2),3)
10000 loops, best of 3: 24.3us per loop
np.repeat(0.5*np.arange(N, dtype=np.float32), 2).reshape(-1, 2)
10000 loops, best of 3: 9.23 us per loop
np.array(np.repeat(0.5*np.arange(N), 2), np.float32).reshape(-1, 2)
10000 loops, best of 3: 10.4 us per loop
(using %%timeit on ipython)

Related

Numpy append to empty array

I want to append a numpy array to a empty numpy array but it's not working.
reconstructed = numpy.empty((4096,))
to_append = reconstruct(p, e_faces, weights, mu, i)
# to_append=array([129.47776809, 129.30775937, 128.90932868, ..., 103.64777681, 104.99912816, 105.93984307]) It's shape is (4096,)
numpy.append(reconstructed, to_append, axis=0)
#Axis is not working anyway.
Plz help me. I want to put that long array in the empty one. The result is just empty.
Look at what empty produces:
In [140]: x = np.empty((5,))
In [141]: x
Out[141]: array([0. , 0.25, 0.5 , 0.75, 1. ])
append makes a new array; it does not change x
In [142]: np.append(x, [1,2,3,4,5], axis=0)
Out[142]: array([0. , 0.25, 0.5 , 0.75, 1. , 1. , 2. , 3. , 4. , 5. ])
In [143]: x
Out[143]: array([0. , 0.25, 0.5 , 0.75, 1. ])
we have to assign it to a new variable:
In [144]: y = np.append(x, [1,2,3,4,5], axis=0)
In [145]: y
Out[145]: array([0. , 0.25, 0.5 , 0.75, 1. , 1. , 2. , 3. , 4. , 5. ])
Look at that y - those random values that were in x are also in y!
Contrast that with list
In [146]: alist = []
In [147]: alist
Out[147]: []
In [148]: alist.append([1,2,3,4,5])
In [149]: alist
Out[149]: [[1, 2, 3, 4, 5]]
The results are very different. Don't use this as a model for creating arrays.
If you need to build an array row by row, use the list append to collect the rows in one list, and then make the array from that.
In [150]: z = np.array(alist)
In [151]: z
Out[151]: array([[1, 2, 3, 4, 5]])

turning a list of numpy.ndarray to a matrix in order to perform multiplication

i have vectors of this form :
test=np.linspace(0,1,10)
i want to stack them horizontally in order to make a matrix .
problem is that i define them in a loop so the first stack is between an empty matrix and the first column vector , which gives the following error:
ValueError: all the input arrays must have same number of dimensions
bottom line - i have a for loop that with every iteration creates a vector p1 and i want to add it to a final matrix of the form :
[p1 p2 p3 p4] which i could then do matrix operations on such as multiplying by the transposed etc
If you've got a list of 1D arrays that you want horizontally stacked, you could convert them all to column first, but it's probably easier to just vertically stack them and then transpose:
In [6]: vector_list = [np.linspace(0, 1, 10) for _ in range(3)]
In [7]: np.vstack(vector_list).T
Out[7]:
array([[0. , 0. , 0. ],
[0.11111111, 0.11111111, 0.11111111],
[0.22222222, 0.22222222, 0.22222222],
[0.33333333, 0.33333333, 0.33333333],
[0.44444444, 0.44444444, 0.44444444],
[0.55555556, 0.55555556, 0.55555556],
[0.66666667, 0.66666667, 0.66666667],
[0.77777778, 0.77777778, 0.77777778],
[0.88888889, 0.88888889, 0.88888889],
[1. , 1. , 1. ]])
How did you get this dimension error? What does empty array have to do with it?
A list of arrays of the same length:
In [610]: alist = [np.linspace(0,1,6), np.linspace(10,11,6)]
In [611]: alist
Out[611]:
[array([0. , 0.2, 0.4, 0.6, 0.8, 1. ]),
array([10. , 10.2, 10.4, 10.6, 10.8, 11. ])]
Several ways of making an array from them:
In [612]: np.array(alist)
Out[612]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
In [614]: np.stack(alist)
Out[614]:
array([[ 0. , 0.2, 0.4, 0.6, 0.8, 1. ],
[10. , 10.2, 10.4, 10.6, 10.8, 11. ]])
If you want to join them in columns, you can transpose one of the above, or use:
In [615]: np.stack(alist, axis=1)
Out[615]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
np.column_stack is also handy.
In newer numpy versions you can do:
In [617]: np.linspace((0,10),(1,11),6)
Out[617]:
array([[ 0. , 10. ],
[ 0.2, 10.2],
[ 0.4, 10.4],
[ 0.6, 10.6],
[ 0.8, 10.8],
[ 1. , 11. ]])
You don't specify how you create the 'empty array' and how you attempt to stack. I can't exactly recreate the error message (full traceback would have helped). But given that message did you check the number of dimensions of the inputs? Did they match?
Array stacking in a loop is tricky. You have to pay close attention to the shapes, especially of the initial 'empty' array. There isn't a close analog to the empty list []. np.array([]) is 1d with shape (1,). np.empty((0,6)) is 2d with shape (0,6). Also all the stacking functions create a new array with each call (non operate in-place), so they are inefficient (compared to list append).

How to use arrays to access matrix elements?

I need to change all nans of a matrix to a different value. I can easily get the nan positions using argwhere, but then I am not sure how to access those positions programmatically. Here is my nonworking code:
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
for pos in nanPositions :
myMatrix[pos] = maxval
the problem is that myMatrix[pos] does not accept pos as an array.
The more-efficient way of generating your output has already been covered by sacul. However, you're incorrectly indexing your 2D matrix in the case where you want to use an array.
At least to me, it's a bit unintuitive, but you need to use:
myMatrix[[all_row_indices], [all_column_indices]]
The following will give you what you expect:
import numpy as np
myMatrix = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
nanPositions = np.argwhere(np.isnan(myMatrix))
maxVal = np.nanmax(abs(myMatrix))
print(myMatrix[nanPositions[:, 0], nanPositions[:, 1]])
You can see more about advanced indexing in the documentation
In [54]: arr = np.array([[3.2,2,float('NaN'),3],[3,1,2,float('NaN')],[3,3,3,3]])
...:
In [55]: arr
Out[55]:
array([[3.2, 2. , nan, 3. ],
[3. , 1. , 2. , nan],
[3. , 3. , 3. , 3. ]])
Location of the nan:
In [56]: np.where(np.isnan(arr))
Out[56]: (array([0, 1]), array([2, 3]))
In [57]: np.argwhere(np.isnan(arr))
Out[57]:
array([[0, 2],
[1, 3]])
where produces a tuple of arrays; argwhere the same values but as a 2d array
In [58]: arr[Out[56]]
Out[58]: array([nan, nan])
In [59]: arr[Out[56]] = [100,200]
In [60]: arr
Out[60]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3. , 1. , 2. , 200. ],
[ 3. , 3. , 3. , 3. ]])
The argwhere can be used to index individual items:
In [72]: for ij in Out[57]:
...: print(arr[tuple(ij)])
100.0
200.0
The tuple() is needed here because np.array([1,3]) in interpreted as 2 element indexing on the first dimension.
Another way to get that indexing tuple is to use unpacking:
In [74]: [arr[i,j] for i,j in Out[57]]
Out[74]: [100.0, 200.0]
So while argparse looks useful, it is trickier to use than plain where.
You could, as noted in the other answers, use boolean indexing (I've already modified arr so the isnan test no longer works):
In [75]: arr[arr>10]
Out[75]: array([100., 200.])
More on indexing with a list or array, and indexing with a tuple:
In [77]: arr[[0,0]] # two copies of row 0
Out[77]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [78]: arr[(0,0)] # one element
Out[78]: 3.2
In [79]: arr[np.array([0,0])] # same as list
Out[79]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
In [80]: arr[np.array([0,0]),:] # making the trailing : explicit
Out[80]:
array([[ 3.2, 2. , 100. , 3. ],
[ 3.2, 2. , 100. , 3. ]])
You can do this instead (IIUC):
myMatrix[np.isnan(myMatrix)] = np.nanmax(abs(myMatrix))

Call functions with varying parameters to modify a numpy array efficiently

I want to eliminate the unefficient for loop from this code
import numpy as np
x = np.zeros((5,5))
for i in range(5):
x[i] = np.random.choice(i+1, 5)
While maintaining the output given
[[0. 0. 0. 0. 0.]
[0. 0. 1. 0. 0.]
[0. 2. 2. 1. 0.]
[1. 2. 3. 1. 0.]
[1. 0. 3. 3. 1.]]
I have tried this
i = np.arange(5)
x[i] = np.random.choice(i+1, 5)
But it outputs
[[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]
[0. 1. 1. 3. 3.]]
Is it possible to remove the loop? If not, which is the most efficient way to proceed for a big array and a lot of repetitions?
Create a random int array with the highest number per row as the number of columns. Hence, we can use np.random.randint with its high arg set as the no. of cols. Then, perform modulus operation to set across each row a different limit defined by the row number. Thus, we would have a vectorized implementation like so -
def create_rand_limited_per_row(m,n):
s = np.arange(1,m+1)
return np.random.randint(low=0,high=n,size=(m,n))%s[:,None]
Sample run -
In [45]: create_rand_limited_per_row(m=5,n=5)
Out[45]:
array([[0, 0, 0, 0, 0],
[0, 0, 0, 0, 0],
[1, 2, 0, 2, 1],
[0, 0, 1, 3, 0],
[1, 2, 3, 3, 2]])
To leverage multi-core with numexpr module for large data -
import numexpr as ne
def create_rand_limited_per_row_numepxr(m,n):
s = np.arange(1,m+1)[:,None]
a = np.random.randint(0,n,(m,n))
return ne.evaluate('a%s')
Benchmarking
# Original approach
def create_rand_limited_per_row_loopy(m,n):
x = np.empty((m,n),dtype=int)
for i in range(m):
x[i] = np.random.choice(i+1, n)
return x
Timings on 1k x 1k data -
In [71]: %timeit create_rand_limited_per_row_loopy(m=1000,n=1000)
10 loops, best of 3: 20.6 ms per loop
In [72]: %timeit create_rand_limited_per_row(m=1000,n=1000)
100 loops, best of 3: 14.3 ms per loop
In [73]: %timeit create_rand_limited_per_row_numepxr(m=1000,n=1000)
100 loops, best of 3: 6.98 ms per loop

Correlate one set of vectors to another in numpy?

Let's say I have a set of vectors (readings from sensor 1, readings from sensor 2, readings from sensor 3 -- indexed first by timestamp and then by sensor id) that I'd like to correlate to a separate set of vectors (temperature, humidity, etc -- also all indexed first by timestamp and secondly by type).
What is the cleanest way in numpy to do this? It seems like it should be a rather simple function...
In other words, I'd like to see:
> a.shape
(365,20)
> b.shape
(365, 5)
> correlations = magic_correlation_function(a,b)
> correlations.shape
(20, 5)
Cheers,
/YGA
P.S. I've been asked to add an example.
Here's what I would like to see:
$ In [27]: x
$ Out[27]:
array([[ 0, 0, 0],
[-1, 0, -1],
[-2, 0, -2],
[-3, 0, -3],
[-4, 0.1, -4]])
$ In [28]: y
$ Out[28]:
array([[0, 0],
[1, 0],
[2, 0],
[3, 0],
[4, 0.1]])
$ In [28]: magical_correlation_function(x, y)
$ Out[28]:
array([[-1. , 0.70710678, 1. ]
[-0.70710678, 1. , 0.70710678]])
Ps2: whoops, mis-transcribed my example. Sorry all. Fixed now.
The simplest thing that I could find was using the scipy.stats package
In [8]: x
Out[8]:
array([[ 0. , 0. , 0. ],
[-1. , 0. , -1. ],
[-2. , 0. , -2. ],
[-3. , 0. , -3. ],
[-4. , 0.1, -4. ]])
In [9]: y
Out[9]:
array([[0. , 0. ],
[1. , 0. ],
[2. , 0. ],
[3. , 0. ],
[4. , 0.1]])
In [10]: import scipy.stats
In [27]: (scipy.stats.cov(y,x)
/(numpy.sqrt(scipy.stats.var(y,axis=0)[:,numpy.newaxis]))
/(numpy.sqrt(scipy.stats.var(x,axis=0))))
Out[27]:
array([[-1. , 0.70710678, -1. ],
[-0.70710678, 1. , -0.70710678]])
These aren't the numbers you got, but you've mixed up your rows. (Element [0,0] should be 1.)
A more complicated, but purely numpy solution is
In [40]: numpy.corrcoef(x.T,y.T)[numpy.arange(x.shape[1])[numpy.newaxis,:]
,numpy.arange(y.shape[1])[:,numpy.newaxis]]
Out[40]:
array([[-1. , 0.70710678, -1. ],
[-0.70710678, 1. , -0.70710678]])
This will be slower because it computes the correlation of each element in x with each other element in x, which you don't want. Also, the advanced indexing techniques used to get the subset of the array you desire can make your head hurt.
If you're going to use numpy intensely, get familiar with the rules on broadcasting and indexing. They will help you push as much down to the C-level as possible.
Will this do what you want?
correlations = dot(transpose(a), b)
Note: if you do this, you'll probably want to standardize or whiten a and b first, e.g. something equivalent to this:
a = sqrt((a - mean(a))/(var(a)))
b = sqrt((b - mean(b))/(var(b)))
As David said, you should define the correlation you're using. I don't know of any definitions of correlation that gives sensible numbers when correlating empty and non-empty signals.

Categories