Retrieve indexes of min and max values in np.ndarray - python

i am working on some tif files and i have to plot dependecies between temperature and vegatation index based on .tif file. It was just FYI. Now my programming problem.
I'm using python 2.7 (x64).
I have big ndarray form NumPy lib, contains values of temerature and second (same size) with vegetation idex. mergedmask is my mask (same size like other arrays) where False value mean it is valid data.
maxTS = np.amax(toa[mergedmask==False])
minTS = np.amin(toa[mergedmask==False])
maxVI = np.amax(ndvi1[mergedmask==False])
minVi = np.amin(ndvi1[mergedmask==False])
In upper variables i have minimum and maximum values of TS (temperature) and VI (vegetation index). Everything is ok. I am happy. Now i have to find coords in toa and ndvi1 arrays. So i am using this:
ax,ay = np.unravel_index(ndvi1[mergedmask==False].argmin(),ndvi1.shape)
To simplify my msg i just focus only on minVI. Upper line return 2 indexes. Then:
newMinVi = ndvi1[ax][ay]
should assign to newMinVi same value as minVi. But it doesn't. I check near indexes like ax-1, ax+1, ay-1,ay+1 and all of them is not even close to my minVi value. Have you any ideas to get coord of my minVi value.

ndvi1[mergedmask==False].argmin() will give you the index of the minimum in ndvi1[mergedmask==False], i.e., the index into a new array, corresponding to the places where mergedmask is False.
The problem here is that ndvi1[mergedmask==False] isn't really a mask. It selects those values of ndvi1 which meets the condition, and assembles those values into a new 1D array. For instance, check what ndvi1[mergedmask==False].size is, and compare it to ndvi1.size.
What you probably want to be doing is to create a real masked array:
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
ax, ay = np.unravel_index(ndvi1_masked.argmin(), ndvi1.shape)
Hope this helps!

Almost what i want.
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
Masked pretty well but not this values what i want. I just have to change statement to mergedmask==False and finally i got:
myNdviMasked = np.ma.masked_array(ndvi1,(mergedmask!=False))
bx, by = np.unravel_index(myNdviMasked.argmin(), myNdviMasked.shape)
Thank You for help :)

Related

Can anyone explain why the maximum value of a concatenation between two arrays is so much higher than the max value in either single array?

I have the following two datasets - both from netCDF files:
ds1 = observed_1979_01
ds2 = observed_1979_02
I want to extract the variable labelled 'swvl1' from both datasets, and I do this by:
m = ds1.variables['swvl1'][0,:,:]
n = ds2.variables['swvl1'][0,:,:]
I want to concantenate these two arrays together, which I do using np.dstack (though the same problem outlined here occurs with np.concatenate as well), such like:
d = np.dstack((m,n))
Now if I look at the maximum value in either array, I get that:
max_m = 0.76293164
max_n = 0.76335037
However, the max value f the concatenated arrays is:
max_d = 9.96921e+36
Why is this happening? I believe something must be going massively wrong in the concatenating of the two arrays to give a different maximum value, but I can't figure out what it is. Does anyone have any ideas?
The maximum value 9.96921e+36 is identical to the default _FillValue, which could indicate that your arrays contain uninitialized values before (and after) they are concatenated. Be sure all values are initialized to valid values before computing the maximum, and/or give the routine that computes the maximum the value 9.96921e+36 as the missing value to ignore.
Responding to question in comment below:
Yes. Uninitialized in this context means that the variable was defined and space allocated on disk to hold its values, however, no values were ever written. By default in netCDF, unwritten values appear as 9.96921e+36 when read.

check if subarray is in array of arrays

I've got an array of arrays where I store x,y,z coordinates and a measurement at that coordinate like:
measurements = [[x1,y1,z1,val1],[x2,y2,z2,val2],[...]]
Now before adding a measurement for a certain coordinate I want to check if there is already a measurement for that coordinate. So I can only keep the maximum val measurement.
So the question is:
Is [xn, yn, zn, ...] already in measurements
My approach so far would be to iterate over the array and compare with a sclied entry like
for measurement in measurements:
if measurement_new[:3] == measurement[:3]:
measurement[3] = measurement_new[3] if measurement_new[3] > measurement[3] else measurement[3]
But with the measurements array getting bigger this is very unefficient.
Another approach would be two separate arrays coords = [[x1,y1,z1], [x2,y2,z2], [...]] and vals = [val1, val2, ...]
This would allow to check for existing coordinates effeciently with [x,y,z] in coords but would have to merge the arrays later on.
Can you suggest a more efficent method for soving this problem?
If you want to stick to built-in types (if not see last point in Notes below) I suggest using a dict for the measurements:
measurements = {(x1,y1,z1): val1,
(x2,y2,z2): val2}
Then adding a new value (x,y,z,val) can simply be:
measurements[(x,y,z)] = max(measurements.get((x,y,z), 0), val)
Notes:
The value 0 in measurements.get is supposed to be the lower bound of the values you are expecting. If you have values below 0 then change it to an appropriate lower bound such that whenever (x,y,z) is not present in your measures get returns the lower bound and thus max will return val. You can also avoid having to specify the lower bound and write:
measurements[(x,y,z)] = max(measurements.get((x,y,z), val), val)
You need to use tuple as type for your keys, hence the (x,y,z). This is because lists cannot be hashed and so not permitted as keys.
Finally, depending on the complexity of the task you are performing, consider using more complex data types. I would recommend having a look at pandas DataFrames they are ideal to deal with such kind of things.

looping through complicated nested dictionary

I have a rather complex list of dictionaries with nested dictionaries and arrays. I am trying to figure out a way to either,
make the list of data less complicated and then loop through the
raster points or,
find a way to loop through the array of raster points as is.
What I am ultimately trying to do is loop through all raster points within each polygon, perform a simple greater than or less than on the value assigned to that raster point (values are elevation values). If greater than a given value assign 1, if less than given value assign 0. I would then create a separate array of these 1s and 0s of which I can then get an average value.
I have found all these points (allpoints within pts), but they are in arrays within a dictionary within another dictionary within a list (of all polygons) at least I think, I could be wrong in the organization as dictionaries are rather new to me.
The following is my code:
import numpy as np
def mystat(x):
mystat = dict()
mystat['allpoints'] = x
return mystat
stats = zonal_stats('acp.shp','myGeoTIFF.tif')
pts = zonal_stats('acp.shp','myGeoTIFF.tif', add_stats={'mystat':mystat})
Link to my documents. Any help or direction would be greatly appreciated!
I assume you are using rasterstats package. You could try something like this:
threshold_value = 15 # You may change this threshold value to yours
for o_idx in range(0, len(pts)):
data = pts[o_idx]['mystat']['allpoints'].data
for d_idx in range(0, len(data)):
for p_idx in range(0, len(data[d_idx])):
# You may change the conditions below as you want
if data[d_idx][p_idx] > threshold_value:
data[d_idx][p_idx] = 1
elif data[d_idx][p_idx] <= threshold_value:
data[d_idx][p_idx] = 0;
It is going to update the data within the pts list

Change entries in large arrays at specific indices without for-loops

I try to set specific entries in a 2 dimensional list to the maximum of itself and another value.
In the following example img is representing an image read by cv2.imread. msk is a boolean-Array of the same dimension. I want to set all entries of img, where msk is True, to value if value > the specific entry. The following example always sets it to value even if value is smaller than the previous entry.
img[msk] = value
Something like
img[msk] = max(img[msk], value)
doesn't work.
Solving it with two for-Loops needs much more time as img represents a huge image (about 20000x10000 pxls)
Thanks a lot for your help!
Here's you problem, I will assume that value is a list, as otherwise it would not be 2 dimensional.
Now with Value as a list, you cannot compare max(img(msk),value) as both are lists, or if value is not, 1. If that is the case what you may have meant is max(img[msk]) or value is a list max(img[msk]+value) or finally if value is an int , max(img[msk]+[value]).
P.S I am assuming a lot of things, as you are a bit unclear. If I said something wrong, please correct me in the comments.
EDIT: Based on the OP's comment below, the problem is similar, you cannot do max of an array. Heres a solution, not sure if it is more expensive the iterating, but does not use it.
First do the first max
img[msk] = value
Then create a second max, which is basically only True if it has not already been masked, and if it is less than the value.
msk2 = img<value
Finally apply the second msk
img[msk2] = value
Heres an example
import numpy as np
img = np.array([[1,20],[3,4]])
msk = np.array([[True,False],[False,True]])
value = 7
img[msk]= value
# [[7,20],[3,7]]
msk2 = img < value
img[msk2] = value
#[[7,20],[7,7]]

Python - Original data matrix gets modified after for i, v in enumerate(): statement

Please see the code snippet below:
import numpy as np
# Load the .txt file in
myData = np.loadtxt('data.txt')
# Extract the time and acceleration columns
time = myData[:,0]
# Extract the linear acceleration columns
xLinearAcc = myData[:,4]
yLinearAcc = myData[:,5]
zLinearAcc = myData[:,6]
# Find the linear accelerations
xLinearAccSqr = myData[:,0]
for i, v in enumerate(xLinearAcc):
xLinearAccSqr[i] = pow(v,2)
myData is my 2D data matrix. What I am trying to do is to extract the 4th column into an new array xLinearAcc. Then I square every single term in xLinearAcc and store them into another new array xLinearAccSqr.
(The reason why I have xLinearAccSqr = myData[:,0] is that if I do not have that line, the compiler always tells me that my xLinearAccSqr is undefined. So I just randomly make it equal to the 1st column, because anyway later all the values get overwritten. Dunno whether this line causes trouble or not)
Then comes the problem.
The first column of myData gets strangely modified. I do not want this.
Anyone can help??
I will really appreciate the help!!~~
==========================UPDATES=======================================
Problem solved.
Post the solution here may help others.
Use
xLinearAccSqr = copy(myData[:,0])
Some how I guess Python passes the references instead of the values.
Thus, just make a copy then.
NumPy arrays behave differently from regular Python lists. In NumPy, basic slicing returns a view on the original array. That's why your original array gets modified when you modify the slice.
Create a new array using any of the array creation routines.

Categories