Wrong values with np.mean()? - python

I'm quite new to programming (in Python) and so I don't understand what is going on here. I have an image (35x64) as a 3D-array and with np.mean(), I attempted to extract the mean of one color channel of one row:
print(np.mean(img[30][:][0]))
For comparison, I also wrote a for-loop to append the exact same values in a list and calculating the mean with that:
for i in range(64):
img_list.append(img[30][i][0])
print(np.mean(img_list))
Now, for a strange reason, it gives different values:
First output: 117.1
Second output: 65.7
By looking at the list, I discovered that the second one is correct. Can somebody with more experience explain to me why this is exactly happening and how to fix that? I don't want to use the second, longer code chunk in my programs but am searching for a 1-line solution that gives a correct value.

There's a subtle difference between img[30][:][0] and img[30,:,0] (the one you were expecting).
Let's see with an example:
img = np.arange(35*64*3).reshape(35,64,3)
img[30][:][0]
# array([5760, 5761, 5762])
img[30,:,0]
# array([5760, 5763, ... 5946, 5949])
So you simply need to:
print(np.mean(img[30,:,0]))
(which is more efficient anyways).
Some details: in your original syntax, the [:] is actually just triggering a copy of the array:
xx = img[30]
yy = img[30][:]
print (xx is yy, xx.shape, yy.shape, np.all(xx==yy))
# False (64, 3) (64, 3) True # i.e. both array are equal
So when you take img[30][:][0], you're actually getting the 3 colors of the first pixel of row 30.

Related

Correcting for index error with "pass" but still getting "list index out of range" error?

I'm fairly new to python and I'm working on a bit of a complicated (but fun!) project. I won't go into too many details, but I've noticed that previous questions similar to this tend to be for very specific situations, which makes them more difficult for me to get anything out of, but surely easier for people to try to answer.
I'm essentially working with a 36x36 matrix (simply called "array" in my code), where the bottom-left corner is entry 0, the entry above that is 1, and so on until we reach the 36th entry at the top-left corner and come back down to the 37th entry being to the right of entry 0, etc. For a given entry, I want to know the values of the entries to the right, left, up, and down of the argument. Of course, there are several edge cases that won't have 4 neighbors, so I've made an effort to take those into account, but I am getting seemingly inconsistent errors.
Here's the latest thing I've tried:
def neighbors(i):
neighbors_array = []
if array[i+1] in array:
neighbors_array.append(array[i+1])
if array[i-1] in array:
neighbors_array.append(array[i-1])
if array[i+36] in array:
neighbors_array.append(array[i+36])
if array[i-36] not in array:
neighbors_array.append(array[i-36])
return neighbors_array
For an argument in the middle of the array, say 800, I get 4 values that perfectly match what I would get if i individually printed array[i+1], array[i-1], etc. But for an argument I know is in the leftmost column of the matrix I get an output array with 4 entries even though I'm only expecting 3 since the argument doesn't have an [i-36] element associated to it. For an argument I know is in the rightmost column, I get IndexError: list index out of range
I'm confused as to why both edge cases don't have the same problem, and I want to fix both issues. As for entries in the top and bottom rows, I also get an unexpected 4-entry output, with the exceptions of the top-right and bottom-right corners, for which I get the index error.
I've tried converting the array to an actual matrix, but the neighbors function output is more complicated and I get similar problems anyways, so I think I'd rather stick with the array.
This doesn't work as a bounds check:
if array[i-1] not in array:
pass
because as soon as you say array[i-1] you're going to raise an exception, before your if predicate is even done evaluating. You need to check that i is a valid index before you use it as an array index.
This would be the idiomatic way of doing this type of bounds check:
if i not in range(len(array)):
For your specific use case where you want to "wrap" values that are out of bounds, I'd suggest using the modulo (%) operator rather than using bounds checks to individually wrap in one direction or another. Just let math do the work for you. :)
for neighbor in [i-1, i+1]:
neighbors_array.append(array[neighbor % len(array)])
since 36 % 36 == 0, -1 % 36 == 35, etc.
The actual problem you see is in the code array[something] in array (originally array[i+36] not in array and so on).
If we try to evaluate this step by step we get the following:
first array[something] is evaluated. That might lead to an IndexError. If it succeeds we have some value
now value in array is evaluated which in our case will be always True as we did previously get the value from the array itself.
Solution:
For lists you should check the length of the list to prevent IndexError:
if len(array) > i+36:
do_whatever_with(array[i+36])
Pythonic aproach:
In python it is considered good style to "ask for forgiveness instead of permission", so instead of the if-in-range-do pattern you could use a try-access-except-cleanup pattern:
try:
do_whatever_with(array[i+36])
except IndexError:
pass # because you just skipped the case in your original code as well

Change entries in large arrays at specific indices without for-loops

I try to set specific entries in a 2 dimensional list to the maximum of itself and another value.
In the following example img is representing an image read by cv2.imread. msk is a boolean-Array of the same dimension. I want to set all entries of img, where msk is True, to value if value > the specific entry. The following example always sets it to value even if value is smaller than the previous entry.
img[msk] = value
Something like
img[msk] = max(img[msk], value)
doesn't work.
Solving it with two for-Loops needs much more time as img represents a huge image (about 20000x10000 pxls)
Thanks a lot for your help!
Here's you problem, I will assume that value is a list, as otherwise it would not be 2 dimensional.
Now with Value as a list, you cannot compare max(img(msk),value) as both are lists, or if value is not, 1. If that is the case what you may have meant is max(img[msk]) or value is a list max(img[msk]+value) or finally if value is an int , max(img[msk]+[value]).
P.S I am assuming a lot of things, as you are a bit unclear. If I said something wrong, please correct me in the comments.
EDIT: Based on the OP's comment below, the problem is similar, you cannot do max of an array. Heres a solution, not sure if it is more expensive the iterating, but does not use it.
First do the first max
img[msk] = value
Then create a second max, which is basically only True if it has not already been masked, and if it is less than the value.
msk2 = img<value
Finally apply the second msk
img[msk2] = value
Heres an example
import numpy as np
img = np.array([[1,20],[3,4]])
msk = np.array([[True,False],[False,True]])
value = 7
img[msk]= value
# [[7,20],[3,7]]
msk2 = img < value
img[msk2] = value
#[[7,20],[7,7]]

Retrieve indexes of min and max values in np.ndarray

i am working on some tif files and i have to plot dependecies between temperature and vegatation index based on .tif file. It was just FYI. Now my programming problem.
I'm using python 2.7 (x64).
I have big ndarray form NumPy lib, contains values of temerature and second (same size) with vegetation idex. mergedmask is my mask (same size like other arrays) where False value mean it is valid data.
maxTS = np.amax(toa[mergedmask==False])
minTS = np.amin(toa[mergedmask==False])
maxVI = np.amax(ndvi1[mergedmask==False])
minVi = np.amin(ndvi1[mergedmask==False])
In upper variables i have minimum and maximum values of TS (temperature) and VI (vegetation index). Everything is ok. I am happy. Now i have to find coords in toa and ndvi1 arrays. So i am using this:
ax,ay = np.unravel_index(ndvi1[mergedmask==False].argmin(),ndvi1.shape)
To simplify my msg i just focus only on minVI. Upper line return 2 indexes. Then:
newMinVi = ndvi1[ax][ay]
should assign to newMinVi same value as minVi. But it doesn't. I check near indexes like ax-1, ax+1, ay-1,ay+1 and all of them is not even close to my minVi value. Have you any ideas to get coord of my minVi value.
ndvi1[mergedmask==False].argmin() will give you the index of the minimum in ndvi1[mergedmask==False], i.e., the index into a new array, corresponding to the places where mergedmask is False.
The problem here is that ndvi1[mergedmask==False] isn't really a mask. It selects those values of ndvi1 which meets the condition, and assembles those values into a new 1D array. For instance, check what ndvi1[mergedmask==False].size is, and compare it to ndvi1.size.
What you probably want to be doing is to create a real masked array:
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
ax, ay = np.unravel_index(ndvi1_masked.argmin(), ndvi1.shape)
Hope this helps!
Almost what i want.
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
Masked pretty well but not this values what i want. I just have to change statement to mergedmask==False and finally i got:
myNdviMasked = np.ma.masked_array(ndvi1,(mergedmask!=False))
bx, by = np.unravel_index(myNdviMasked.argmin(), myNdviMasked.shape)
Thank You for help :)

Replace loop with broadcasting in numpy -> memory error

I have an 2D-array (array1), which has an arbitrary number of rows and in the first column I have strictly monotonic increasing numbers (but not linearly), which represent a position in my system, while the second one gives me a value, which represents the state of my system for and around the position in the first column.
Now I have a second array (array2); its range should usually be the same as for the first column of the first array, but does not matter to much, as you will see below.
I am now interested for every element in array2:
1. What is the argument in array1[:,0], which has the closest value to the current element in array2?
2. What is the value (array1[:,1]) of those elements.
As usually array2 will be longer than the number of rows in array1 it is perfectly fine, if I get one argument from array1 more than one time. In fact this is what I expect.
The value from 2. is written in the second and third column, as you will see below.
My striped code looks like this:
from numpy import arange, zeros, absolute, argmin, mod, newaxis, ones
ysize1 = 50
array1 = zeros((ysize1+1,2))
array1[:,0] = arange(ysize1+1)**2
# can be any strictly monotonic increasing array
array1[:,1] = mod(arange(ysize1+1),2)
# in my current case, but could also be something else
ysize2 = (ysize1)**2
array2 = zeros((ysize2+1,3))
array2[:,0] = arange(0,ysize2+1)
# is currently uniformly distributed over the whole range, but does not necessarily have to be
a = 0
for i, array2element in enumerate(array2[:,0]):
a = argmin(absolute(array1[:,0]-array2element))
array2[i,1] = array1[a,1]
It works, but takes quite a lot time to process large arrays. I then tried to implement broadcasting, which seems to work with the following code:
indexarray = argmin(absolute(ones(array2[:,0].shape[0])[:,newaxis]*array1[:,0]-array2[:,0][:,newaxis]),1)
array2[:,2]=array1[indexarray,1] # just to compare the results
Unfortunately now I seem to run into a different problem: I get a memory error on the sizes of arrays I am using in the line of code with the broadcasting.
For small sizes it works, but for larger ones where len(array2[:,0]) is something like 2**17 (and could be even larger) and len(array1[:,0]) is about 2**14. I get, that the size of the array is bigger than the available memory. Is there an elegant way around that or to speed up the loop?
I do not need to store the intermediate array(s), I am just interested in the result.
Thanks!
First lets simplify this line:
argmin(absolute(ones(array2[:,0].shape[0])[:,newaxis]*array1[:,0]-array2[:,0][:,newaxis]),1)
it should be:
a = array1[:, 0]
b = array2[:, 0]
argmin(abs(a - b[:, newaxis]), 1)
But even when simplified, you're creating two large temporary arrays. If a and b have sizes M and N, b - a and abs(...) each create a temporary array of size (M, N). Because you've said that a is monotonically increasing, you can avoid the issue all together by using a binary search (sorted search) which is much faster anyways. Take a look at the answer I wrote to this question a while back. Using the function from this answer, try this:
closest = find_closest(array1[:, 0], array2[:, 0])
array2[:, 2] = array1[closest, 1]

What does python slicing syntax [o:,:] mean

Just a small and probably very simple question. Someone gave me the following line of code:
im = axs[0,i].pcolormesh(imgX[o:,:], imgY[o:,:], img.mean(-1)[o:,:], cmap='Greys')
I know ":" means everything in that column or row (or array depth, depending on how you look at it). But what does "o:" mean?
The following is not related to the usage, but shows how the operation "is parsed".
class X:
def __getitem__(self, index):
return index
X()[:,:]
>> (slice(None,None,None), slice(None,None,None))
And with different values for clarity:
X()[0, 1:, 3:4, 5:6:7]
>> (0, slice(1,None,None), slice(3,4,None), slice(5,6,7))
So, with that in mind img[o:,:] is like img[o:, :] is like
img.__getitem__( (slice(o,None,None), slice(None,None,None)) )
o is a variable like any other (but with a very bad name, as it can be confused with a zero).
[o:, :] means "all the elements from the first axis starting in the element o, and all in the second axis. In your particular case, the image will show only the rows from o to the bottom.
I want to add that in this case, you are getting a view, i.e., a reference to the original array, so the data is not actually copied.

Categories