Create matrix in a loop with numpy - python

I would like to build up a numpy matrix using rows I get in a loop. But how do I initialize the matrix? If I write
A = []
A = numpy.vstack((A, [1, 2]))
I get
ValueError: all the input array dimensions except for the concatenation axis must match exactly
What's the best practice for this?
NOTE: I do not know the number of rows in advance. The number of columns is known.

Unknown number of rows
One way is to form a list of lists, and then convert to a numpy array in one operation:
final = []
# x is some generator
for item in x:
final.append(x)
A = np.array(x)
Or, more elegantly, given a generator x:
A = np.array(list(x))
This solution is time-efficient but memory-inefficient.
Known number of rows
Append operations on numpy arrays are expensive and not recommended. If you know the size of the final array in advance, you can instantiate an empty (or zero) array of your desired size, and then fill it with values. For example:
A = np.zeros((10, 2))
A[0] = [1, 2]
Or in a loop, with a trivial assignment to demonstrate syntax:
A = np.zeros((2, 2))
# in reality, x will be some generator whose length you know in advance
x = [[1, 2], [3, 4]]
for idx, item in enumerate(x):
A[idx] = item
print(A)
array([[ 1., 2.],
[ 3., 4.]])

Related

Covert numpy.ndarray to a list

I'm trying to convert this numpy.ndarray to a list
[[105.53518731]
[106.45317529]
[107.37373843]
[108.00632646]
[108.56373502]
[109.28813113]
[109.75593207]
[110.57458371]
[111.47960639]]
I'm using this function to convert it.
conver = conver.tolist()
the output is this, I'm not sure whether it's a list and if so, can I access its elements by doing cover[0] , etc
[[105.5351873125], [106.45317529411764], [107.37373843478261], [108.00632645652173], [108.56373502040816], [109.28813113157895], [109.75593206666666], [110.57458370833334], [111.47960639393939]]
finally, after I convert it to a list, I try to multiply the list members by 1.05 and get this error!
TypeError: can't multiply sequence by non-int of type 'float'
You start with a 2d array, with shape (n,1), like this:
In [342]: arr = np.random.rand(5,1)*100
In [343]: arr
Out[343]:
array([[95.39049043],
[19.09502087],
[85.45215423],
[94.77657561],
[32.7869103 ]])
tolist produces a list - but it contains lists; each [] layer denotes a list. Notice that the [] nesting matches the array's:
In [344]: arr.tolist()
Out[344]:
[[95.39049043424225],
[19.095020872584335],
[85.4521542296349],
[94.77657561477125],
[32.786910295446425]]
To get a number you have to index through each list layer:
In [345]: arr.tolist()[0]
Out[345]: [95.39049043424225]
In [346]: arr.tolist()[0][0]
Out[346]: 95.39049043424225
In [347]: arr.tolist()[0][0]*1.05
Out[347]: 100.16001495595437
If you first turn the array into a 1d one, the list indexing is simpler:
In [348]: arr.ravel()
Out[348]: array([95.39049043, 19.09502087, 85.45215423, 94.77657561, 32.7869103 ])
In [349]: arr.ravel().tolist()
Out[349]:
[95.39049043424225,
19.095020872584335,
85.4521542296349,
94.77657561477125,
32.786910295446425]
In [350]: arr.ravel().tolist()[0]
Out[350]: 95.39049043424225
But if your primary goal is to multiply the elements, doing with the array is simpler:
In [351]: arr * 1.05
Out[351]:
array([[100.16001496],
[ 20.04977192],
[ 89.72476194],
[ 99.5154044 ],
[ 34.42625581]])
You can access elements of the array with:
In [352]: arr[0,0]
Out[352]: 95.39049043424225
But if you do need to iterate, the tolist() option is good to know. Iterating on lists is usually faster than iterating on an array. With an array you should try to use the fast whole-array methods.
you convert to list of list, so you could not broadcast.
import numpy as np
x = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639],]
x = np.hstack(x)
x * 1.05
array([110.81194668, 111.77583405, 112.74242535, 113.40664278,
113.99192177, 114.75253769, 115.24372867, 116.1033129 ,
117.05358671])
yes, it's a list, you can check the type of a variable:
type(a)
to multiply each element with 1.05 then run the code below:
x = [float(i[0]) * 1.05 for i in a]
print(x)
Try this:
import numpy as np
a = [[105.53518731],
[106.45317529],
[107.37373843],
[108.00632646],
[108.56373502],
[109.28813113],
[109.75593207],
[110.57458371],
[111.47960639]]
b = [elem[0] for elem in a]
b = np.array(b)
print(b*1.05)

create a list of list of N-dimensional numpy arrays

I want to create a list of list of 2x2 numpy arrays
array([[0, 0],
[1, 1]])
for example I want to fill a list with 8 of these arrays.
x = []
for j in range(9):
for i in np.random.randint(2, size=(2, 2)):
x.append([i])
this gives me a 1x1 array
z = iter(x)
next(z)
[array([0, 1])]
what am I missing here ?
You missed that are iterating over a 2x2 array 9 times. Each iteration yields a row of the array which is what you see when you look at the first element - the first row of the first matrix. Not only that, you append this row within a list, so you actually have 18 lists with a single element. What you want to do is append the matrix directly, with no inner loop and definitely no additional [] around, or better yet:
x = [np.random.randint(2, size=(2, 2)) for _ in range(9)]

Passing elements to function efficiently

I have an array of size m x n.
I want to pass each m row individually to a function and save the result in the same row.
What would be the efficient way of doing this using numpy.
Currently I am using for loops to achieve this:
X : size(m x n)
p : size(m x n)
for i in np.arange(X.shape[0]):
X[i] = some_func(X[i], p[i])
Since you are modifying the row of X, you can skip the indexing and use zip to iterate on the rows:
In [833]: X=np.ones((2,3)); p=np.arange(6).reshape(2,3)
In [834]: for x,y in zip(X,p):
...: x[:] = x + y
...:
In [835]: X
Out[835]:
array([[1., 2., 3.],
[4., 5., 6.]])
If you still needed the index you could add enumerate:
for i,(x,y) in enumerate(zip(X,p)):...
There isn't much difference in efficiency in these alternatives. You still have to call your function m times. You still have to select rows, either by index or by iteration. Both are a bit slower on arrays than on the equivalent list.
The best thing is to write your function so it works directly with the 2d arrays, and doesn't need iteration.
X+p
But if the function is too complex for that, then its evaluation time is likely to be relatively high (compared to the iteration mechanism).
You can make a list of all first row of the X and p Matrix using List Comprehension as shown below. Then you can easily send the first row of X and p as a parameters to your some_function
import numpy as np
X = np.random.randint(9, size=(3, 3))
p = np.random.randint(9, size=(3, 3))
print(X.shape, p.shape)
XList = [i[0] for i in X]
pList = [j[0] for j in p]
print (XList)
print (pList)
for i in np.arange(XList, pList):
X[i] = some_func(XList, pList)

How to append a selection of a numpy array to an empty numpy array

I have a three .txt files to which I have successfully made into a numpy array. If you are curious these files are Level 2 data from the Advanced Composition Experiment (ACE). The particular files are found in the MAG and SWEPAM sections and are 16 second average and 64 second average, respectively. The data in a nut shell is representative of the z-component magnetic field of an inbound particle field, its constituents by measure of counts per area, and its velocity. Currently the focus of the study is on inbound hydrogen, but I digress. The code is as follows I use to read and save the files (as well as fix any errors) is provided below:
Bz = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/AC/MAG/ACE_MAG_Data_SEPT_18_2015.txt", dtype = bytes).astype(float)
SWEPAM_HV = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/ACE/SWEPAM/Proton_Density/ACE_SWEPAM_H_Density_20150918.txt", dtype = bytes).astype(float)
SWEPAM_HD = np.loadtxt(r"/home/ary/Desktop/Arya/Project/Data/ACE/SWEPAM/Proton_Speed/ACE_SWEPAM_H_Velocity_20150918.txt",dtype = bytes).astype(float)
Bz = np.ma.masked_array(Bz, Bz <= -999, fill_value = 0)
SWEPAM_HD = np.ma.masked_array(SWEPAM_HD, SWEPAM_HD <= -999, fill_value = 0)
SWEPAM_HV = np.ma.masked_array(SWEPAM_HV, SWEPAM_HV <= -999, fill_value = 0)
Mag_time = np.arange(0,86400, 16, dtype = float)
SWEPAM_time = np.arange(0,86400,64, dtype = float)
However, within these array I am particularly interested in only the 1349th position to the 2024th position. These numbers are of interest because of my investigation into an anomaly which happened between these two points. So I figured the following would lead me to success. To which it hasn't and many variations have failed too. I present to you the most recent script I have right now:
Mag_time_prime = np.array([])
Bz_prime = np.array([])
for i in range(1349,2024):
append(Mag_time_prime,Mag_time[i]).astype(float)
append(Bz_prime,Bz[i]).astype(float)
print(Mag_time_prime.shape)
print(Bz_prime.shape)
I had figured that by making empty arrays (I did try np.empty(0) for the primes and couldn't get that to work for me) that I could just make a for loop to locate and append the i_th position from the Bz and Mag_time to the empty 'prime' arrays within the specified range. However the 'prime' arrays have continuously popped out empty arrays. So my question, where have I gone wrong and how should I fix it?
List append acts on the list itself:
In [1]: alist = []
In [2]: alist.append(5)
In [3]: alist.append(3)
In [4]: alist
Out[4]: [5, 3]
np.append does not change its arguments:
In [5]: arr = np.array([])
In [6]: np.append(arr,1)
Out[6]: array([ 1.])
In [7]: np.append(arr,2)
Out[7]: array([ 2.])
In [8]: arr
Out[8]: array([], dtype=float64)
You have to assign the value of append back to arr to get the list equivalent behavior:
In [9]: arr=np.append(arr,1)
In [10]: arr=np.append(arr,2)
In [11]: arr
Out[11]: array([ 1., 2.])
Each time you use np.append you create a new copy (it uses np.concatenate). For one or two times that's ok, but if done repeatedly it is inefficient.
The preferred way is to use list append to build a list, and then make an array from that:
In [12]: np.array(alist)
Out[12]: array([5, 3])
You have to understand np.concatenate before you can use np.append properly. It is a poor substitute for list append.

Creating index array in numpy - eliminating double for loop

I have some physical simulation code, written in python and using numpy/scipy. Profiling the code shows that 38% of the CPU time is spent in a single doubly nested for loop - this seems excessive, so I've been trying to cut it down.
The goal of the loop is to create an array of indices, showing which elements of a 1D array the elements of a 2D array are equal to.
indices[i,j] = where(1D_array == 2D_array[i,j])
As an example, if 1D_array = [7.2, 2.5, 3.9] and
2D_array = [[7.2, 2.5]
[3.9, 7.2]]
We should have
indices = [[0, 1]
[2, 0]]
I currently have this implemented as
for i in range(ni):
for j in range(nj):
out[i, j] = (1D_array - 2D_array[i, j]).argmin()
The argmin is needed as I'm dealing with floating point numbers, and so the equality is not necessarily exact. I know that every number in the 1D array is unique, and that every element in the 2D array has a match, so this approach gives the correct result.
Is there any way of eliminating the double for loop?
Note:
I need the index array to perform the following operation:
f = complex_function(1D_array)
output = f[indices]
This is faster than the alternative, as the 2D array has a size of NxN compared with 1xN for the 1D array, and the 2D array has many repeated values. If anyone can suggest a different way of arriving at the same output without going through an index array, that could also be a solution
In pure Python you can do this using a dictionary in O(N) time, the only time penalty is going to be the Python loop involved:
>>> arr1 = np.array([7.2, 2.5, 3.9])
>>> arr2 = np.array([[7.2, 2.5], [3.9, 7.2]])
>>> indices = dict(np.hstack((arr1[:, None], np.arange(3)[:, None])))
>>> np.fromiter((indices[item] for item in arr2.ravel()), dtype=arr2.dtype).reshape(arr2.shape)
array([[ 0., 1.],
[ 2., 0.]])
The dictionary method that some others have suggest might work, but it requires that you know ahead of time that every element in your target array (the 2d array) has an exact match in your search array (your 1d array). Even when this should be true in principle, you still have to deal with floating point precision issues, for example try this .1 * 3 == .3.
Another approach is to use numpy's searchsorted function. searchsorted takes a sorted 1d search array and any traget array then finds the closest elements in the search array for every item in the target array. I've adapted this answer for your situation, take a look at it for a description of how the find_closest function works.
import numpy as np
def find_closest(A, target):
order = A.argsort()
A = A[order]
idx = A.searchsorted(target)
idx = np.clip(idx, 1, len(A)-1)
left = A[idx-1]
right = A[idx]
idx -= target - left < right - target
return order[idx]
array1d = np.array([7.2, 2.5, 3.9])
array2d = np.array([[7.2, 2.5],
[3.9, 7.2]])
indices = find_closest(array1d, array2d)
print(indices)
# [[0 1]
# [2 0]]
To get rid of the two Python for loops, you can do all of the equality comparisons "in one go" by adding new axes to the arrays (making them broadcastable with each other).
Bear in mind that this produces a new array containing len(arr1)*len(arr2) values. If this is a very big number, this approach could be infeasible depending on the limitations of your memory. Otherwise, it should be reasonably quick:
>>> (arr1[:,np.newaxis] == arr2[:,np.newaxis]).argmax(axis=1)
array([[0, 1],
[2, 0]], dtype=int32)
If you need to get the index of the closest matching value in arr1 instead, use:
np.abs(arr1[:,np.newaxis] - arr2[:,np.newaxis]).argmin(axis=1)

Categories