looping through complicated nested dictionary

looping through complicated nested dictionary - python

I have a rather complex list of dictionaries with nested dictionaries and arrays. I am trying to figure out a way to either,
make the list of data less complicated and then loop through the
raster points or,
find a way to loop through the array of raster points as is.
What I am ultimately trying to do is loop through all raster points within each polygon, perform a simple greater than or less than on the value assigned to that raster point (values are elevation values). If greater than a given value assign 1, if less than given value assign 0. I would then create a separate array of these 1s and 0s of which I can then get an average value.
I have found all these points (allpoints within pts), but they are in arrays within a dictionary within another dictionary within a list (of all polygons) at least I think, I could be wrong in the organization as dictionaries are rather new to me.
The following is my code:
import numpy as np
def mystat(x):
mystat = dict()
mystat['allpoints'] = x
return mystat
stats = zonal_stats('acp.shp','myGeoTIFF.tif')
pts = zonal_stats('acp.shp','myGeoTIFF.tif', add_stats={'mystat':mystat})
Link to my documents. Any help or direction would be greatly appreciated!

I assume you are using rasterstats package. You could try something like this:
threshold_value = 15 # You may change this threshold value to yours
for o_idx in range(0, len(pts)):
data = pts[o_idx]['mystat']['allpoints'].data
for d_idx in range(0, len(data)):
for p_idx in range(0, len(data[d_idx])):
# You may change the conditions below as you want
if data[d_idx][p_idx] > threshold_value:
data[d_idx][p_idx] = 1
elif data[d_idx][p_idx] <= threshold_value:
data[d_idx][p_idx] = 0;
It is going to update the data within the pts list

Related

check if subarray is in array of arrays

I've got an array of arrays where I store x,y,z coordinates and a measurement at that coordinate like:
measurements = [[x1,y1,z1,val1],[x2,y2,z2,val2],[...]]
Now before adding a measurement for a certain coordinate I want to check if there is already a measurement for that coordinate. So I can only keep the maximum val measurement.
So the question is:
Is [xn, yn, zn, ...] already in measurements
My approach so far would be to iterate over the array and compare with a sclied entry like
for measurement in measurements:
if measurement_new[:3] == measurement[:3]:
measurement[3] = measurement_new[3] if measurement_new[3] > measurement[3] else measurement[3]
But with the measurements array getting bigger this is very unefficient.
Another approach would be two separate arrays coords = [[x1,y1,z1], [x2,y2,z2], [...]] and vals = [val1, val2, ...]
This would allow to check for existing coordinates effeciently with [x,y,z] in coords but would have to merge the arrays later on.
Can you suggest a more efficent method for soving this problem?

If you want to stick to built-in types (if not see last point in Notes below) I suggest using a dict for the measurements:
measurements = {(x1,y1,z1): val1,
(x2,y2,z2): val2}
Then adding a new value (x,y,z,val) can simply be:
measurements[(x,y,z)] = max(measurements.get((x,y,z), 0), val)
Notes:
The value 0 in measurements.get is supposed to be the lower bound of the values you are expecting. If you have values below 0 then change it to an appropriate lower bound such that whenever (x,y,z) is not present in your measures get returns the lower bound and thus max will return val. You can also avoid having to specify the lower bound and write:
measurements[(x,y,z)] = max(measurements.get((x,y,z), val), val)
You need to use tuple as type for your keys, hence the (x,y,z). This is because lists cannot be hashed and so not permitted as keys.
Finally, depending on the complexity of the task you are performing, consider using more complex data types. I would recommend having a look at pandas DataFrames they are ideal to deal with such kind of things.

Python: Is there a way to get the average of the n newest numbers in an array?

I am trying to build a sort of a battery meter, where I have one program that collects voltage samples and adds it to an array. My idea is, that I collect a lot of data when the battery is full, and then build a function that will compare this data with the average of the last 100 or so readings of the voltage as new readings are added every few seconds as long as I don't interrupt the process.
I am using matplotlib to show the voltage output and so far it is working fine: I posted an answer here on live changing graphs
The voltage function looks like this:
pullData = open("dynamicgraph.txt","r").read() //values are stored here in another function
dataArray = pullData.split('\n')
xar = []
yar = []
averagevoltage = 0
for eachLine in dataArray:
if len(eachLine)>=19:
x,y = eachLine.split(',')
xar.append(np.int64(x)) //a datetime value
yar.append(float(y)) //the reading
ax1.clear()
ax1.plot(xar,yar)
ax1.set_ylim(ymin=25,ymax=29)
if len(yar) > 1:
plt.title("Voltage: " + str(yar [-1]) + " Average voltage: "+ str(np.mean(yar)))
I am just wondering what the syntax of getting the average of the last x numbers of the array should look like?
if len(yar) > 100
#get average of last 100 values only

It's a rather simple problem. Assuming you're using numpy which provides easy functions for averaging.
array = np.random.rand(200, 1)
last100 = array[-100:] # Retrieve last 100 entries
print(np.average(last100)) # Get the average of them
If you want to cast your normal array to a numpy array you can do it with:
np.array(<your-array-goes-here>)

Use slice notation with negative index to get the n last items in a list.
yar[-100:]
If the slice is larger than the list, the entire list will be returned.

I don't think you even need to use numpy. You can access the last 100 elements by slicing your array as follows:
l = yar[-100:]
This returns all elements at indices from -100 ('100th' last element) to -1 (last element). Then, you can just native Python functions as follows.
mean = sum(l) / len(l)
Sum(x) returns the sum of all values within the list, and len(l) returns the length of the list.

you could use the Python Standard Library statistics:
import statistics
statistics.mean(your_data_list[-n:]) # n = n newst numbers

Change entries in large arrays at specific indices without for-loops

I try to set specific entries in a 2 dimensional list to the maximum of itself and another value.
In the following example img is representing an image read by cv2.imread. msk is a boolean-Array of the same dimension. I want to set all entries of img, where msk is True, to value if value > the specific entry. The following example always sets it to value even if value is smaller than the previous entry.
img[msk] = value
Something like
img[msk] = max(img[msk], value)
doesn't work.
Solving it with two for-Loops needs much more time as img represents a huge image (about 20000x10000 pxls)
Thanks a lot for your help!

Here's you problem, I will assume that value is a list, as otherwise it would not be 2 dimensional.
Now with Value as a list, you cannot compare max(img(msk),value) as both are lists, or if value is not, 1. If that is the case what you may have meant is max(img[msk]) or value is a list max(img[msk]+value) or finally if value is an int , max(img[msk]+[value]).
P.S I am assuming a lot of things, as you are a bit unclear. If I said something wrong, please correct me in the comments.
EDIT: Based on the OP's comment below, the problem is similar, you cannot do max of an array. Heres a solution, not sure if it is more expensive the iterating, but does not use it.
First do the first max
img[msk] = value
Then create a second max, which is basically only True if it has not already been masked, and if it is less than the value.
msk2 = img<value
Finally apply the second msk
img[msk2] = value
Heres an example
import numpy as np
img = np.array([[1,20],[3,4]])
msk = np.array([[True,False],[False,True]])
value = 7
img[msk]= value
# [[7,20],[3,7]]
msk2 = img < value
img[msk2] = value
#[[7,20],[7,7]]

python normal distribution

I have a list of numbers, with sample mean and SD for these numbers. Right now I am trying to find out the numbers out of mean+-SD,mean +-2SD and mean +-3SD.
For example, in the part of mean+-SD, i made the code like this:
ND1 = [np.mean(l)+np.std(l,ddof=1)]
ND2 = [np.mean(l)-np.std(l,ddof=1)]
m=sorted(l)
print(m)
ND68 = []
if ND2 > m and m< ND1:
ND68.append(m<ND2 and m>ND1)
print (ND68)
Here is my question:
1. Could number be calculated by the list and arrange. If so, which part I am doing wrong. Or there is some package I can use to solve this.

This might help. We will use numpy to grab the values you are looking for. In my example, I create a normally distributed array and then use boolean slicing to return the elements that are outside of +/- 1, 2, or 3 standard deviations.
import numpy as np
# create a random normally distributed integer array
my_array = np.random.normal(loc=30, scale=10, size=100).astype(int)
# find the mean and standard dev
my_mean = my_array.mean()
my_std = my_array.std()
# find numbers outside of 1, 2, and 3 standard dev
# the portion inside the square brackets returns an
# array of True and False values. Slicing my_array
# with the boolean array return only the values that
# are True
out_std_1 = my_array[np.abs(my_array-my_mean) > my_std]
out_std_2 = my_array[np.abs(my_array-my_mean) > 2*my_std]
out_std_3 = my_array[np.abs(my_array-my_mean) > 3*my_std]

You are on the right track there. You know the mean and standard deviation of your list l, though I'm going to call it something a little less ambiguous, say, samplePopulation.
Because you want to do this for several intervals of standard deviation, I recommend crafting a small function. You can call it multiple times without too much extra work. Also, I'm going to use a list comprehension, which is just a for loop in one line.
import numpy as np
def filter_by_n_std_devs(samplePopulation, numStdDevs):
# you mostly got this part right, no need to put them in lists though
mean = np.mean(samplePopulation) # no brackets needed here
std = np.std(samplePopulation) # or here
band = numStdDevs * std
# this is the list comprehension
filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]
return filteredPop
# now call your function with however many std devs you want
filteredPopulation = filter_by_n_std_devs(samplePopulation, 1)
print(filteredPopulation)
Here's a translation of the list comprehension (based on your use of append it looks like you may not know what these are, otherwise feel free to ignore).
# remember that you provide the variable samplePopulation
# the above list comprehension
filteredPop = [x for x in samplePopulation if x < mean - band or x > mean + band]
# is equivalent to this:
filteredPop = []
for num in samplePopulation:
if x < mean - band or x > mean + band:
filteredPop.append(num)
So to recap:
You don't need to make a list object out of your mean and std calculations
The function call let's you plug in your samplePopulation and any number of standard deviations you want without having to go in and manually change the value
List comprehensions are one line for loops, more or less, and you can even do the filtering you want right inside it!

Retrieve indexes of min and max values in np.ndarray

i am working on some tif files and i have to plot dependecies between temperature and vegatation index based on .tif file. It was just FYI. Now my programming problem.
I'm using python 2.7 (x64).
I have big ndarray form NumPy lib, contains values of temerature and second (same size) with vegetation idex. mergedmask is my mask (same size like other arrays) where False value mean it is valid data.
maxTS = np.amax(toa[mergedmask==False])
minTS = np.amin(toa[mergedmask==False])
maxVI = np.amax(ndvi1[mergedmask==False])
minVi = np.amin(ndvi1[mergedmask==False])
In upper variables i have minimum and maximum values of TS (temperature) and VI (vegetation index). Everything is ok. I am happy. Now i have to find coords in toa and ndvi1 arrays. So i am using this:
ax,ay = np.unravel_index(ndvi1[mergedmask==False].argmin(),ndvi1.shape)
To simplify my msg i just focus only on minVI. Upper line return 2 indexes. Then:
newMinVi = ndvi1[ax][ay]
should assign to newMinVi same value as minVi. But it doesn't. I check near indexes like ax-1, ax+1, ay-1,ay+1 and all of them is not even close to my minVi value. Have you any ideas to get coord of my minVi value.

ndvi1[mergedmask==False].argmin() will give you the index of the minimum in ndvi1[mergedmask==False], i.e., the index into a new array, corresponding to the places where mergedmask is False.
The problem here is that ndvi1[mergedmask==False] isn't really a mask. It selects those values of ndvi1 which meets the condition, and assembles those values into a new 1D array. For instance, check what ndvi1[mergedmask==False].size is, and compare it to ndvi1.size.
What you probably want to be doing is to create a real masked array:
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
ax, ay = np.unravel_index(ndvi1_masked.argmin(), ndvi1.shape)
Hope this helps!

Almost what i want.
ndvi1_masked = np.ma.masked_array(ndvi1, (mergedmask==False))
Masked pretty well but not this values what i want. I just have to change statement to mergedmask==False and finally i got:
myNdviMasked = np.ma.masked_array(ndvi1,(mergedmask!=False))
bx, by = np.unravel_index(myNdviMasked.argmin(), myNdviMasked.shape)
Thank You for help :)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

looping through complicated nested dictionary - python

Related

check if subarray is in array of arrays

Python: Is there a way to get the average of the n newest numbers in an array?

Change entries in large arrays at specific indices without for-loops

python normal distribution

Retrieve indexes of min and max values in np.ndarray

Categories

Resources