How do I stack column-wise n vectors of shape (x,) where x could be any number?
For example,
from numpy import *
a = ones((3,))
b = ones((2,))
c = vstack((a,b)) # <-- gives an error
c = vstack((a[:,newaxis],b[:,newaxis])) #<-- also gives an error
hstack works fine but concatenates along the wrong dimension.
Short answer: you can't. NumPy does not support jagged arrays natively.
Long answer:
>>> a = ones((3,))
>>> b = ones((2,))
>>> c = array([a, b])
>>> c
array([[ 1. 1. 1.], [ 1. 1.]], dtype=object)
gives an array that may or may not behave as you expect. E.g. it doesn't support basic methods like sum or reshape, and you should treat this much as you'd treat the ordinary Python list [a, b] (iterate over it to perform operations instead of using vectorized idioms).
Several possible workarounds exist; the easiest is to coerce a and b to a common length, perhaps using masked arrays or NaN to signal that some indices are invalid in some rows. E.g. here's b as a masked array:
>>> ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])
masked_array(data = [1.0 1.0 --],
mask = [False False True],
fill_value = 1e+20)
This can be stacked with a as follows:
>>> ma.vstack([a, ma.array(np.resize(b, a.shape[0]), mask=[False, False, True])])
masked_array(data =
[[1.0 1.0 1.0]
[1.0 1.0 --]],
mask =
[[False False False]
[False False True]],
fill_value = 1e+20)
(For some purposes, scipy.sparse may also be interesting.)
In general, there is an ambiguity in putting together arrays of different length because alignment of data might matter. Pandas has different advanced solutions to deal with that, e.g. to merge series into dataFrames.
If you just want to populate columns starting from first element, what I usually do is build a matrix and populate columns. Of course you need to fill the empty spaces in the matrix with a null value (in this case np.nan)
a = ones((3,))
b = ones((2,))
arraylist=[a,b]
outarr=np.ones((np.max([len(ps) for ps in arraylist]),len(arraylist)))*np.nan #define empty array
for i,c in enumerate(arraylist): #populate columns
outarr[:len(c),i]=c
In [108]: outarr
Out[108]:
array([[ 1., 1.],
[ 1., 1.],
[ 1., nan]])
There is a new library for efficiently handling this type of arrays: https://github.com/scikit-hep/awkward-array
I know this is a really old post and that there may be a better way of doing this, BUT why not just use append for such an operation:
import numpy as np
a = np.ones((3,))
b = np.ones((2,))
c = np.append(a, b)
print(c)
output:
[1. 1. 1. 1. 1.]
If you definitely want to use NumPy, you can match the shapes with np.nan and then "unpack" the nan-filled array later. Here is an example with functions.
import numpy as np
from numpy import *
a = np.array([[3,3,3]]).astype(float)
b = np.array([[2,2]]).astype(float)
# Extend each vector in array with Nan to reach same shape
def Pack_Matrices_with_NaN(List_of_matrices, Matrix_size):
Matrix_with_nan = np.arange(Matrix_size)
for array in List_of_matrices:
start_position = len(array[0])
for x in range(start_position,Matrix_size):
array = np.insert(array, (x), np.nan, axis=1)
Matrix_with_nan = np.vstack([Matrix_with_nan, array])
Matrix_with_nan = Matrix_with_nan[1:]
return Matrix_with_nan
arrays = [a,b]
packed_matrices = Pack_Matrices_with_NaN(arrays, 5)
print(packed_matrices)
Output:
[[ 3. 3. 3. nan nan]
[ 2. 2. nan nan nan]]
However, the easiest way would be to append the arrays to a list:
import numpy as np
a = np.array([3,3,3])
b = np.array([2,2])
c = []
c.append(a)
c.append(b)
print(c)
Output:
[array([3, 3, 3]), array([2, 2])]
I used the following code to combine lists of different length in a numpy array and to keep the length information in a second array:
import numpy as np
# create an example list (number can be increased):
my_list=[np.ones(i) for i in np.arange(1000)]
# measure and store length and find max:
dlc=np.array([len(i) for i in my_list]) #list contains the data length code
max_length=max(dlc)
# now we allocate an empty array
result=np.empty(max_length*len(my_list)).reshape(len(my_list),max_length)
# populate:
for i in np.arange(len(dlc)):
result[i][np.arange(dlc[i])]=my_list[i]
# check how the 10th element looks like
print(result[10],dlc[10])
I'm sure the code can be improved in case of the loops. But it already works quite quick because the memory is pre allocated by the empty array.
Related
I have two 4D matrices, which I would like to add. The matrices have the exact same dimension and number of elements, but they both contain randomly distributed NaN values.
I would prefer to add them as below using numpy.nansum.
(1) if two values are added I want the sum to be a value,
(2) if a value and a NaN are added I want the sum to be the value and
(3) if two NaN are added I want the sum to be NaN.
Herewith what I tried
a[6x7x180x360]
b[6x7x180x360]
C=np.nansum[(a,b)]
C=np.nansum(np.dstack((a,b)),2)
But I am unable to get the resultant matrix with same dimension as input. It means resultant matrix C should be in [6x7x180x360].
Anyone can help in this regard. Thank you in advance.
You could use np.stack((a,b)) to stack along a new 0-axis, then call nansum to sum along that 0-axis:
C = np.nansum(np.stack((a,b)), axis=0)
For example,
In [34]: a = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [35]: b = np.random.choice([1,2,3,np.nan], size=(6,7,180,360))
In [36]: np.stack((a,b)).shape
Out[36]: (2, 6, 7, 180, 360)
In [37]: np.nansum(np.stack((a,b)), axis=0).shape
Out[37]: (6, 7, 180, 360)
You had the right idea, but np.dstack stacks along the third axis, which is not desireable here since you already have 4 axes:
In [31]: np.dstack((a,b)).shape
Out[31]: (6, 7, 360, 360)
Regarding your point (3):
Note that the behavior of np.nansum depends on the NumPy version:
In NumPy versions <= 1.8.0 Nan is returned for slices that are all-NaN or
empty. In later versions zero is returned.
If you are using NumPy version > 1.8.0, then you may have to use a solution such as
Maarten Fabré's to address this issue.
I believe the function np.nansum is not appropriate in your case. If I understand your question correctly, you wish to do an element-wise addition of two matrices with a little of logic regarding the NaN values.
Here is the full example on how to do it:
import numpy as np
a = np.array([ [np.nan, 2],
[3, np.nan]])
b = np.array([ [3, np.nan],
[1, np.nan]])
result = np.add(a,b)
a_is_nan = np.isnan(a)
b_is_nan = np.isnan(b)
result_is_nan = np.isnan(result)
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
result[mask_a] = a[mask_a]
mask_b = np.logical_and(result_is_nan, np.logical_not(b_is_nan))
result[mask_b] = b[mask_b]
print(result)
A little bit of explanation:
The first operation is np.add(a,b). This adds both matrices and any NaN element will produce a result of NaN also.
To select the NaN values from either arrays, we use a logical mask:
# result_is_nan is a boolean array containing True whereve the result is np.NaN. This occurs when any of the two element were NaN
result_is_nan = np.isnan(result)
# mask_a is a boolean array which 'flags' elements that are NaN in result but were not NaN in a !
mask_a = np.logical_and(result_is_nan, np.logical_not(a_is_nan))
# Using that mask, we assign those value to result
result[mask_a] = a[mask_a]
There you have it !
I think the easiest way is to use np.where
result = np.where(
np.isnan(a+b),
np.where(np.isnan(a), b, a),
a+b
)
This reads as:
if a+b is not nan, use a+b, else use a, unless it is nan, then use b. Whether or b is nan is of little consequence then.
Alternatively, you can use it like this:
result2 = np.where(
np.isnan(a) & np.isnan(b),
np.nan,
np.nansum(np.stack((a,b)), axis=0)
)
np.testing.assert_equal(result, result2) passes
I have imported data that has the format of a numpy masked array of incrementing integers. The masked elements are irregular and not repeating, e.g printing it yields:
masked = [0,1,--,3,--,5,6,--,--,9,--]
And I have another list of incrementing numbers that doesn't start from zero, and has irregular gaps and is a different size from masked:
data = [1,3,4,6,7,9,10]
I want to remove any element of data if its value is a masked element in masked
So that I get:
result = [1,3,6,9]
As 4, 7 and 10 were masked values in masked.
I think my pseudocode should look something like:
for i in len(masked):
if masked[i] = 'masked' && data[i] == [i]:
del data[i]
But I'm having trouble reconciling the different lengths and mis-matched indices of the two arrays,
Thanks for any help!
Make sure data is an array:
data = np.asarray(data)
Then:
data[~masked.mask[data]]
This will be extremely fast, though it does assume that your masked array contains all numbers from 0 to at least max(data).
You can use set function to get sets of the lists and take their intersection.
Here goes a demo :-
>>> import numpy as np
>>> import numpy.ma as ma
>>> arr = np.array([x for x in range(11)])
>>> masked = ma.masked_array(arr, mask=[0,0,1,0,1,0,0,1,1,0,1])
>>> masked
masked_array(data = [0 1 -- 3 -- 5 6 -- -- 9 --],
mask = [False False True False True False False True True False
True],
fill_value = 999999)
>>> data = np.array([1,3,4,6,7,9,10])
>>> result = list(set(data) & set(masked[~masked.mask]))
>>> result
[1, 3, 6, 9]
>>>
Simple Version:
if I do this:
import numpy as np
a = np.zeros(2)
a[[1, 1]] += np.array([1, 1])
I get [0, 1] as an output. but I would like [0, 2]. Is that possible somehow, using implicit numpy looping instead of looping over it myself?
What-I-actually-need-to-do version:
I have a structured array that contains an index, a value, and some boolean value. I would like to sum those values at those indices, based on the boolean. Clearly that can be done with a simple loop, but it seems like it should be possible with clever numpy indexing (as above).
For example, I have an array with 5 elements that I want to populate from the array with values, indices, and conditions:
import numpy as np
size = 5
nvalues = 10
np.random.seed(1)
a = np.zeros(nvalues, dtype=[('val', float), ('ix', int), ('cond', bool)])
a = np.rec.array(a)
a.val = np.random.rand(nvalues)
a.cond = (np.random.rand(nvalues) > 0.3)
a.ix = np.random.randint(size, size=nvalues)
# obvious solution
obvssum = np.zeros(size)
for i in a:
if i.cond:
obvssum[i.ix] += i.val
# is something this possible?
doesntwork = np.zeros(size)
doesntwork[a[a.cond].ix] += a[a.cond].val
print(doesntwork)
print(obvssum)
Output:
[ 0. 0. 0.61927097 0.02592623 0.29965467]
[ 0. 0. 1.05459336 0.02592623 1.27063303]
I think what's happening here is if a[a.cond].ix were guaranteed to be unique, my method would work just fine, as noted in the simple example.
This is what the at method of NumPy ufuncs is for:
output = numpy.zeros(size)
numpy.add.at(output, a[a.cond].ix, a[a.cond].val)
I have a three .txt files to which I have successfully made into a numpy array. If you are curious these files are Level 2 data from the Advanced Composition Experiment (ACE). The particular files are found in the MAG and SWEPAM sections and are 16 second average and 64 second average, respectively. The data in a nut shell is representative of the z-component magnetic field of an inbound particle field, its constituents by measure of counts per area, and its velocity. Currently the focus of the study is on inbound hydrogen, but I digress. The code is as follows I use to read and save the files (as well as fix any errors) is provided below:
Bz = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/AC/MAG/ACE_MAG_Data_SEPT_18_2015.txt", dtype = bytes).astype(float)
SWEPAM_HV = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/ACE/SWEPAM/Proton_Density/ACE_SWEPAM_H_Density_20150918.txt", dtype = bytes).astype(float)
SWEPAM_HD = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/ACE/SWEPAM/Proton_Speed/ACE_SWEPAM_H_Velocity_20150918.txt",dtype = bytes).astype(float)
Bz = np.ma.masked_array(Bz, Bz <= -999, fill_value = 0)
SWEPAM_HD = np.ma.masked_array(SWEPAM_HD, SWEPAM_HD <= -999, fill_value = 0)
SWEPAM_HV = np.ma.masked_array(SWEPAM_HV, SWEPAM_HV <= -999, fill_value = 0)
Mag_time = np.arange(0,86400, 16, dtype = float)
SWEPAM_time = np.arange(0,86400,64, dtype = float)
However, within these array I am particularly interested in only the 1349th position to the 2024th position. These numbers are of interest because of my investigation into an anomaly which happened between these two points. So I figured the following would lead me to success. To which it hasn't and many variations have failed too. I present to you the most recent script I have right now:
Mag_time_prime = np.array([])
Bz_prime = np.array([])
for i in range(1349,2024):
append(Mag_time_prime,Mag_time[i]).astype(float)
append(Bz_prime,Bz[i]).astype(float)
print(Mag_time_prime.shape)
print(Bz_prime.shape)
I had figured that by making empty arrays (I did try np.empty(0) for the primes and couldn't get that to work for me) that I could just make a for loop to locate and append the i_th position from the Bz and Mag_time to the empty 'prime' arrays within the specified range. However the 'prime' arrays have continuously popped out empty arrays. So my question, where have I gone wrong and how should I fix it?
List append acts on the list itself:
In [1]: alist = []
In [2]: alist.append(5)
In [3]: alist.append(3)
In [4]: alist
Out[4]: [5, 3]
np.append does not change its arguments:
In [5]: arr = np.array([])
In [6]: np.append(arr,1)
Out[6]: array([ 1.])
In [7]: np.append(arr,2)
Out[7]: array([ 2.])
In [8]: arr
Out[8]: array([], dtype=float64)
You have to assign the value of append back to arr to get the list equivalent behavior:
In [9]: arr=np.append(arr,1)
In [10]: arr=np.append(arr,2)
In [11]: arr
Out[11]: array([ 1., 2.])
Each time you use np.append you create a new copy (it uses np.concatenate). For one or two times that's ok, but if done repeatedly it is inefficient.
The preferred way is to use list append to build a list, and then make an array from that:
In [12]: np.array(alist)
Out[12]: array([5, 3])
You have to understand np.concatenate before you can use np.append properly. It is a poor substitute for list append.
I have two numpy arrays with the dimensions (120, 360), one of the arrays consists of integers and zeros the second consists of floats. I want to replace the values of the second array with nans everywhere there is an integer in the first array. Is there an easy and efficient way to do this?
Also I'd like to replace the integers in the first array with nans and change zeros to ones. Thanks in advance.
You can achieve this easily with logical indexing into the array,
arr2[ arr1 != 0 ] = numpy.NaN
However integer arrays don't support NaNs so you'd have to convert your first array to a float array, i.e.
arr1 = arr1.astype(float)
arr1[arr1 != 0.0] = numpy.NaN
arr1[arr1 == 0.0] = 1.0
Setup the arrays:
>>> import numpy as np
>>> x = np.array([[1,0],[0,4]], dtype=int)
>>> y = np.array([[1.1, 2.2],[3.3, 4.4]], dtype=float)
You can easily set the second array to nan where you want, like this:
>>> y[x != 0] = np.nan
>>> y
array([[ nan, 2.2],
[ 3.3, nan]])
Then convert the first array to floats (since NaN is not an integer) and set the values you want:
>>> x = x.astype(float)
>>> x[x != 0] = np.nan
>>> x[x == 0] = 1
>>> x
array([[ nan, 1.],
[ 1., nan]])
As a comment on the previous answers, I don't think comparing floats with == is that a good idea, and I think some operations are wasted. What about creating a temporary array mask = (X != 0) and use it as index ?
>>> X = X.astype(float)
>>> X[~mask] = np.nan
>>> X[mask] = 1
I don't know your purpose of replacing values with NaNs, but you may want to consider using numpy's masked arrays instead (similar to Pierre's answer, but numpy has builtin mask support!):
import numpy.ma
# mask out values when there is a non-zero integer in arr1
arr2 = numpy.ma.masked_array(arr2, mask=arr1)
# mask out values in arr2 for non-zero integers, and set all remaining values (the zeros) to 1
arr1 = numpy.ma(arr1, mask=(arr1 != 0))
arr1[~arr1.mask] = 1
No integer to float conversion needed, and this allows you to use a lot of numpy's functionality without getting into problems. E.g., calculating the mean of an array with NaNs is certainly a bad idea, with a masked array, this is no problem.