Change entries in large arrays at specific indices without for-loops

Change entries in large arrays at specific indices without for-loops - python

I try to set specific entries in a 2 dimensional list to the maximum of itself and another value.
In the following example img is representing an image read by cv2.imread. msk is a boolean-Array of the same dimension. I want to set all entries of img, where msk is True, to value if value > the specific entry. The following example always sets it to value even if value is smaller than the previous entry.
img[msk] = value
Something like
img[msk] = max(img[msk], value)
doesn't work.
Solving it with two for-Loops needs much more time as img represents a huge image (about 20000x10000 pxls)
Thanks a lot for your help!

Here's you problem, I will assume that value is a list, as otherwise it would not be 2 dimensional.
Now with Value as a list, you cannot compare max(img(msk),value) as both are lists, or if value is not, 1. If that is the case what you may have meant is max(img[msk]) or value is a list max(img[msk]+value) or finally if value is an int , max(img[msk]+[value]).
P.S I am assuming a lot of things, as you are a bit unclear. If I said something wrong, please correct me in the comments.
EDIT: Based on the OP's comment below, the problem is similar, you cannot do max of an array. Heres a solution, not sure if it is more expensive the iterating, but does not use it.
First do the first max
img[msk] = value
Then create a second max, which is basically only True if it has not already been masked, and if it is less than the value.
msk2 = img<value
Finally apply the second msk
img[msk2] = value
Heres an example
import numpy as np
img = np.array([[1,20],[3,4]])
msk = np.array([[True,False],[False,True]])
value = 7
img[msk]= value
# [[7,20],[3,7]]
msk2 = img < value
img[msk2] = value
#[[7,20],[7,7]]

Related

Can anyone explain why the maximum value of a concatenation between two arrays is so much higher than the max value in either single array?

I have the following two datasets - both from netCDF files:
ds1 = observed_1979_01
ds2 = observed_1979_02
I want to extract the variable labelled 'swvl1' from both datasets, and I do this by:
m = ds1.variables['swvl1'][0,:,:]
n = ds2.variables['swvl1'][0,:,:]
I want to concantenate these two arrays together, which I do using np.dstack (though the same problem outlined here occurs with np.concatenate as well), such like:
d = np.dstack((m,n))
Now if I look at the maximum value in either array, I get that:
max_m = 0.76293164
max_n = 0.76335037
However, the max value f the concatenated arrays is:
max_d = 9.96921e+36
Why is this happening? I believe something must be going massively wrong in the concatenating of the two arrays to give a different maximum value, but I can't figure out what it is. Does anyone have any ideas?

The maximum value 9.96921e+36 is identical to the default _FillValue, which could indicate that your arrays contain uninitialized values before (and after) they are concatenated. Be sure all values are initialized to valid values before computing the maximum, and/or give the routine that computes the maximum the value 9.96921e+36 as the missing value to ignore.
Responding to question in comment below:
Yes. Uninitialized in this context means that the variable was defined and space allocated on disk to hold its values, however, no values were ever written. By default in netCDF, unwritten values appear as 9.96921e+36 when read.

looping through complicated nested dictionary

I have a rather complex list of dictionaries with nested dictionaries and arrays. I am trying to figure out a way to either,
make the list of data less complicated and then loop through the
raster points or,
find a way to loop through the array of raster points as is.
What I am ultimately trying to do is loop through all raster points within each polygon, perform a simple greater than or less than on the value assigned to that raster point (values are elevation values). If greater than a given value assign 1, if less than given value assign 0. I would then create a separate array of these 1s and 0s of which I can then get an average value.
I have found all these points (allpoints within pts), but they are in arrays within a dictionary within another dictionary within a list (of all polygons) at least I think, I could be wrong in the organization as dictionaries are rather new to me.
The following is my code:
import numpy as np
def mystat(x):
mystat = dict()
mystat['allpoints'] = x
return mystat
stats = zonal_stats('acp.shp','myGeoTIFF.tif')
pts = zonal_stats('acp.shp','myGeoTIFF.tif', add_stats={'mystat':mystat})
Link to my documents. Any help or direction would be greatly appreciated!

I assume you are using rasterstats package. You could try something like this:
threshold_value = 15 # You may change this threshold value to yours
for o_idx in range(0, len(pts)):
data = pts[o_idx]['mystat']['allpoints'].data
for d_idx in range(0, len(data)):
for p_idx in range(0, len(data[d_idx])):
# You may change the conditions below as you want
if data[d_idx][p_idx] > threshold_value:
data[d_idx][p_idx] = 1
elif data[d_idx][p_idx] <= threshold_value:
data[d_idx][p_idx] = 0;
It is going to update the data within the pts list

Wrong values with np.mean()?

I'm quite new to programming (in Python) and so I don't understand what is going on here. I have an image (35x64) as a 3D-array and with np.mean(), I attempted to extract the mean of one color channel of one row:
print(np.mean(img[30][:][0]))
For comparison, I also wrote a for-loop to append the exact same values in a list and calculating the mean with that:
for i in range(64):
img_list.append(img[30][i][0])
print(np.mean(img_list))
Now, for a strange reason, it gives different values:
First output: 117.1
Second output: 65.7
By looking at the list, I discovered that the second one is correct. Can somebody with more experience explain to me why this is exactly happening and how to fix that? I don't want to use the second, longer code chunk in my programs but am searching for a 1-line solution that gives a correct value.

There's a subtle difference between img[30][:][0] and img[30,:,0] (the one you were expecting).
Let's see with an example:
img = np.arange(35*64*3).reshape(35,64,3)
img[30][:][0]
# array([5760, 5761, 5762])
img[30,:,0]
# array([5760, 5763, ... 5946, 5949])
So you simply need to:
print(np.mean(img[30,:,0]))
(which is more efficient anyways).
Some details: in your original syntax, the [:] is actually just triggering a copy of the array:
xx = img[30]
yy = img[30][:]
print (xx is yy, xx.shape, yy.shape, np.all(xx==yy))
# False (64, 3) (64, 3) True # i.e. both array are equal
So when you take img[30][:][0], you're actually getting the 3 colors of the first pixel of row 30.

Subtracting a number from an array if condition is met python

I am facing a very basic problem in if condition in python.
The array current_img_list is of dimension (500L,1). If the number 82459 is in this array, I want to subtract it by 1.
index = np.random.randint(0,num_train-1,500)
# shape of current_img_list is (500L,1)
# values in index array is randomized
current_img_list = train_data['img_list'][index]
# shape of current_img_list is (500L,1)
if (any(current_img_list) == 82459):
i = np.where(current_img_list == 82459)
final_list = i-1
Explanation of variables - train_data is of type dict. It has 4 elements in it. img_list is one of the elements with size (215375L,1). The value of num_train is 215375 and size of index is (500L,1)
Firsly I don't know whether this loop is working or not. I tried all() function and numpy.where() function but to no success. Secondly, I can't think of a way of how to subtract 1 from 82459 directly from the index at which it is stored without affecting the rest of the values in this array.
Thanks

Looping over the array in Python will be much slower than letting numpy's vectorized operators do their thing:
import numpy as np
num_train = 90000 # for example
index = np.random.randint(0,num_train-1,500)
index[index == 82459] -= 1

current_img_list = np.array([1,2,3,82459,4,5,6])
i = np.where(current_img_list == 82459)
current_img_list[i] -= 1

I'm a little confused by whats trying to be achieved here, but I'll give it a go:
If you're trying to subtract 1 from anywhere in your array that is equal to 82459, then what maybe iterate through the array, with a for loop. Each time the current index is equal to 82459, just set the number at that index -= 1.
If you need more help, please post the rest of the relevant code so I can debug.

Retrieve indexes of min and max values in np.ndarray

i am working on some tif files and i have to plot dependecies between temperature and vegatation index based on .tif file. It was just FYI. Now my programming problem.
I'm using python 2.7 (x64).
I have big ndarray form NumPy lib, contains values of temerature and second (same size) with vegetation idex. mergedmask is my mask (same size like other arrays) where False value mean it is valid data.
maxTS = np.amax(toa[mergedmask==False])
minTS = np.amin(toa[mergedmask==False])
maxVI = np.amax(ndvi1[mergedmask==False])
minVi = np.amin(ndvi1[mergedmask==False])
In upper variables i have minimum and maximum values of TS (temperature) and VI (vegetation index). Everything is ok. I am happy. Now i have to find coords in toa and ndvi1 arrays. So i am using this:
ax,ay = np.unravel_index(ndvi1[mergedmask==False].argmin(),ndvi1.shape)
To simplify my msg i just focus only on minVI. Upper line return 2 indexes. Then:
newMinVi = ndvi1[ax][ay]
should assign to newMinVi same value as minVi. But it doesn't. I check near indexes like ax-1, ax+1, ay-1,ay+1 and all of them is not even close to my minVi value. Have you any ideas to get coord of my minVi value.

ndvi1[mergedmask==False].argmin() will give you the index of the minimum in ndvi1[mergedmask==False], i.e., the index into a new array, corresponding to the places where mergedmask is False.
The problem here is that ndvi1[mergedmask==False] isn't really a mask. It selects those values of ndvi1 which meets the condition, and assembles those values into a new 1D array. For instance, check what ndvi1[mergedmask==False].size is, and compare it to ndvi1.size.
What you probably want to be doing is to create a real masked array:
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
ax, ay = np.unravel_index(ndvi1_masked.argmin(), ndvi1.shape)
Hope this helps!

Almost what i want.
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
Masked pretty well but not this values what i want. I just have to change statement to mergedmask==False and finally i got:
myNdviMasked = np.ma.masked_array(ndvi1,(mergedmask!=False))
bx, by = np.unravel_index(myNdviMasked.argmin(), myNdviMasked.shape)
Thank You for help :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change entries in large arrays at specific indices without for-loops - python

Related

Can anyone explain why the maximum value of a concatenation between two arrays is so much higher than the max value in either single array?

looping through complicated nested dictionary

Wrong values with np.mean()?

Subtracting a number from an array if condition is met python

Retrieve indexes of min and max values in np.ndarray

Categories

Resources