Using Python to change dimensions of numpy array - python

I have data in an array.
The first column is time. Second, latitude, third longitude, fourth precipitation
Sample:
2 70 100 5.6
2 70 110 5.9
2 80 100 6.2
2 80 110 5.0
3 70 100 2.3
3 70 110 1.1
3 80 100 0.0
3 80 110 7.9
I would like to convert this into an array where the y axis is longitude, the z axis is latitude, and the x axis is time. Precipitation amounts will be located at each 3d grid point.
For instance, in the following image:
The sizes of the bubbles represent different precipitation amounts (ignore the colors)
How can I use python to do this?
So far I have:
import numpy as np<br>
a=open('time.dat') #original file
b=open('three.dat','w+')
dif=np.fromfile(a)
tim=dif[:,[0]]
lat=dif[:,[1]]
lon=dif[:,[2]]
pre=dif[:,[3]]
c=np.empty(780,360,720)
780 time steps, 360 latitudes, 720 longitudes

So you want a 2 dimensional array with the inner dimension containing all of the data, and the outer dimension ordered by lon, lat, time.
You can read in the file as a array of values, convert to a 2d array to group them into each 4 tuple. Then translate the column order of the inner array. Next sort the outer dimension on the inner dimension.
>>> data = np.array([2, 70, 100, 5.6, 2, 70, 110, 5.9, 2, 80, 100, 6.2, 2, 80, 110, 5.0, 3, 70, 100, 2.3, 3, 70, 110, 1.1, 3, 80, 100, 0.0, 3, 80, 110, 7.9])
>>> data2 = data.reshape((8, 4))
>>> data2
array([[ 2. , 70. , 100. , 5.6],
[ 2. , 70. , 110. , 5.9],
[ 2. , 80. , 100. , 6.2],
[ 2. , 80. , 110. , 5. ],
[ 3. , 70. , 100. , 2.3],
[ 3. , 70. , 110. , 1.1],
[ 3. , 80. , 100. , 0. ],
[ 3. , 80. , 110. , 7.9]])
>>> data2 = data2[:,[1,2,0,3]]
>>> data2
array([[ 70. , 100. , 2. , 5.6],
[ 70. , 110. , 2. , 5.9],
[ 80. , 100. , 2. , 6.2],
[ 80. , 110. , 2. , 5. ],
[ 70. , 100. , 3. , 2.3],
[ 70. , 110. , 3. , 1.1],
[ 80. , 100. , 3. , 0. ],
[ 80. , 110. , 3. , 7.9]])
The goofiness with view and sort described here

You can't use the numpy reshape for a simple reason : you have data duplicity in your original array (time and positions) and not in the result you want. Before and after a reshape the number of elements must be the same.
You have to do a loop to read your initial array and fill your new array.
Hope it helped

Related

How can I simply filter a 3d numpy array by its 1st column values?

Suppose I have a 3D numpy array like this:
data = np.array([[[1,2,3,4],[1,2.5,3,5]],
[[116,230,450,430],[80,100,300,320]],
[[60,100,120,80],[50,80,100,90]]])
How can I simply extract from it a 3D numpy array of same shape with a condition on axis 0, for example selecting those "rows" for which axis 0 < 3? A naïve way would be
data[data[0]<3]
But this fails:
IndexError: boolean index did not match indexed array along dimension 0; dimension is 3 but corresponding boolean dimension is 2
See my comment above, but from your data I am guessing you want the rows with any values less than 3. If so you could do:
data[(data<3).any(axis=2)]
>>> array([[1. , 2. , 3. , 4. ],
[1. , 2.5, 3. , 5. ]])
EDIT1:
Solution can be achieved using transposition to match up the axis dimensions:
data.T[(data[0]<3).any(axis=0).T].T
>>> array([[[ 1. , 2. ],
[ 1. , 2.5]],
[[116. , 230. ],
[ 80. , 100. ]],
[[ 60. , 100. ],
[ 50. , 80. ]]])
EDIT2:
Another method that does not involve transposing. To apply the mask (data[0]<3).any(axis=0) onto the original data array the axes shapes must match. The shape of the mask is (4,) and data.shape = (3, 2, 4), so we need to apply the mask to the last axis as:
data[..., (data[0]<3).any(axis=0)]
>>> array([[[ 1. , 2. ],
[ 1. , 2.5]],
[[116. , 230. ],
[ 80. , 100. ]],
[[ 60. , 100. ],
[ 50. , 80. ]]])

Minimum sum route through numpy array

This is a two part question.
Part 1
Given the following Numpy array:
foo = array([[22.5, 20. , 0. , 20. ],
[24. , 40. , 0. , 8. ],
[ 0. , 0. , 50. , 9.9],
[ 0. , 0. , 0. , 9. ],
[ 0. , 0. , 0. , 2.5]])
what is the most efficient way to (i) find the two minimal possible sums of values across columns (taking into account cell values greater than zero only) where for every column only one row is used and (ii) keep track of the array index locations visited on that route?
For example, in the example above this would be: minimum_bar = 22.5 + 20 + 50 + 2.5 = 95 at indices [0,0], [0,1], [2,2], [4,3] and next_best_bar = 22.5 + 20 + 50 + 8 = 100.5 at indices [0,0], [0,1], [2,2], [1,3].
Part 2
Similar to Part 1 but now with the constraint that the row-wise of sums of foo (if that row is used in the solution) must be greater than the values in an array (for example np.array([10, 10, 10, 10, 10]). In other words sum(row[0])>array[0]=62.5>10=True but sum(row[4])>array[4]=2.5>10=False.
In which case the result is: minimum_bar = 22.5 + 20 + 50 + 9.9 = 102.4 at indices [0,0], [0,1], [2,2], [2,3] and next_best_bar = 22.5 + 20 + 50 + 20 = 112.5 at indices [0,0], [0,1], [2,2], [0,3].
My initial approach was to find all possible routes (combinations of indices using itertools) but this solution does not scale well for large matrix sizes (e.g., mxn=500x500).
Here's one solution that I came up with (hopefully I didn't misunderstand anything in your question)
def minimum_routes(foo):
assert len(foo) >= 2
assert np.all(np.any(foo > 0, axis=0))
foo = foo.astype(float)
foo[foo <= 0] = np.inf
foo.sort(0)
minimum_bar = foo[0]
next_best_bar = minimum_bar.copy()
c = np.argmin(np.abs(foo[0] - foo[1]))
next_best_bar[c] = foo[1, c]
return minimum_bar, next_best_bar
Let's test it:
foo = np.array([[22.5, 20. , 0. , 20. ],
[24. , 40. , 0. , 8. ],
[ 0. , 0. , 50. , 9.9],
[ 0. , 0. , 0. , 9. ],
[ 0. , 0. , 0. , 2.5]])
# PART 1
minimum_bar, next_best_bar = minimum_routes(foo)
# (array([22.5, 20. , 50. , 2.5]), array([24. , 20. , 50. , 2.5]))
# PART 2
constraint = np.array([10, 10, 10, 10, 10])
minimum_bar, next_best_bar = minimum_routes(foo[foo.sum(1) > constraint])
# (array([22.5, 20. , 50. , 8. ]), array([24., 20., 50., 8.]))
To find the indices:
np.where(foo == minimum_bar)
np.where(foo == next_best_bar)

Interpolate NaN values in a big matrix (not just a list) in python

I'm searching from a simple method to interpolate a matrix that about 10% from values are NaN. For instance:
matrix = np.array([[ np.nan, np.nan, 2. , 3. , 4. ],
[ np.nan, 6. , 7. , 8. , 9. ],
[ 10. , 11. , 12. , 13., 14. ],
[ 15. , 16. , 17. , 18., 19. ],
[ np.nan, np.nan, 22. , 23., np.nan]])
I found a solution that uses griddata from scipy.interpolate, but the solution take much time. (My matrix have about 50 columns and 200,000 rows and the rate of Nan values does not higher than 10%)

Delete columns based on repeat value in one row in numpy array

I'm hoping to delete columns in my arrays that have repeat entries in row 1 as shown below (row 1 has repeats of values 1 & 2.5, so one of each of those values have been been deleted, together with the column each deleted value lies within).
initial_array =
row 0 [[ 1, 1, 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2.5, 2, 1, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 3, 2.5, 1.5, 4,]
row 3 [228, 314, 173, 452, 168, 351, 300, 396]]
final_array =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 314, 173, 452, 351, 396]]
Ways I was thinking of included using some function that checked for repeats, giving a True response for the second (or more) time a value turned up in the dataset, then using that response to delete the row. That or possibly using the return indices function within numpy.unique. I just can't quite find a way through it or find the right function though.
If I could find a way to return an mean value in the row 3 of the retained repeat and the deleted one, that would be even better (see below).
final_array_averaged =
row 0 [[ 1, 1, 1, 1, 1, 1,]
row 1 [0.5, 1, 2.5, 4, 2, 3.5,]
row 2 [ 1, 1.5, 3, 4.5, 2.5, 4,]
row 3 [228, 307, 170.5, 452, 351, 396]]
Thanks in advance for any help you can give to a beginner who is stumped!
You can use the optional arguments that come with np.unique and then use np.bincount to use the last row as weights to get the final averaged output, like so -
_,unqID,tag,C = np.unique(arr[1],return_index=1,return_inverse=1,return_counts=1)
out = arr[:,unqID]
out[-1] = np.bincount(tag,arr[3])/C
Sample run -
In [212]: arr
Out[212]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2.5, 2. , 1. , 3.5],
[ 1. , 1.5, 3. , 4.5, 3. , 2.5, 1.5, 4. ],
[ 228. , 314. , 173. , 452. , 168. , 351. , 300. , 396. ]])
In [213]: out
Out[213]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2. , 2.5, 3.5, 4. ],
[ 1. , 1.5, 2.5, 3. , 4. , 4.5],
[ 228. , 307. , 351. , 170.5, 396. , 452. ]])
As can be seen that the output has now an order with the second row being sorted. If you are looking to keep the order as it was originally, use np.argsort of unqID, like so -
In [221]: out[:,unqID.argsort()]
Out[221]:
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 307. , 170.5, 452. , 351. , 396. ]])
You can find the indices of wanted columns using unique:
>>> indices = np.sort(np.unique(A[1], return_index=True)[1])
Then use a simple indexing to get the desire columns:
>>> A[:,indices]
array([[ 1. , 1. , 1. , 1. , 1. , 1. ],
[ 0.5, 1. , 2.5, 4. , 2. , 3.5],
[ 1. , 1.5, 3. , 4.5, 2.5, 4. ],
[ 228. , 314. , 173. , 452. , 351. , 396. ]])
This is a typical grouping problem, which can be solve elegantly and efficiently using the numpy_indexed package (disclaimer: I am its author):
import numpy_indexed as npi
unique, final_array = npi.group_by(initial_array[1]).mean(initial_array, axis=1)
Note that there are many other reductions than mean; if you want the original behavior you described, you could replace 'mean' with 'first', for instance.

Python converting lists into 2D numpy array

I have some lists that I want to convert into a 2D numpy array.
list1 = [ 2, 7 , 8 , 5]
list2 = [18 ,29, 44,33]
list3 = [2.3, 4.6, 8.9, 7.7]
The numpy array I want is:
[[ 2. 18. 2.3]
[ 7. 29. 4.6]
[ 8. 44. 8.9]
[ 5. 33. 7.7]]
which I can get by typing the individual items from the list directly into the numpy array expression as np.array(([2,18,2.3], [7,29, 4.6], [8,44,8.9], [5,33,7.7]), dtype=float).
But I want to be able to convert the lists into the desired numpy array.
One way to do it would be to create your numpy array and then use the transpose function to convert it to your desired output:
import numpy as np
list1 = [ 2, 7 , 8 , 5]
list2 = [18 ,29, 44,33]
list3 = [2.3, 4.6, 8.9, 7.7]
arr = np.array([list1, list2, list3])
arr = arr.T
print(arr)
Output
[[ 2. 18. 2.3]
[ 7. 29. 4.6]
[ 8. 44. 8.9]
[ 5. 33. 7.7]]
you could use np.transpose directly:
np.transpose([list1, list2, list3])
and this will convert the list of your lists to a numpy array and transpose it (change rows to columns and columns to rows) afterwards:
array([[ 2. , 18. , 2.3],
[ 7. , 29. , 4.6],
[ 8. , 44. , 8.9],
[ 5. , 33. , 7.7]])
also you can use zip function like this
In [1]: import numpy as np
In [2]: list1 = [ 2, 7 , 8 , 5]
In [3]: list2 = [18 ,29, 44,33]
In [4]: list3 = [2.3, 4.6, 8.9, 7.7]
In [5]: np.array(zip(list1,list2,list3))
Out[5]:
array([[ 2. , 18. , 2.3],
[ 7. , 29. , 4.6],
[ 8. , 44. , 8.9],
[ 5. , 33. , 7.7]])

Categories