python efficiently applying function over multiple arrays - python

(new to python so I apologize if this question is basic)
Say I create a function that will calculate some equation
def plot_ev(accuracy,tranChance,numChoices,reward):
ev=(reward-numChoices)*1-np.power((1-accuracy),numChoices)*tranChance)
return ev
accuracy, tranChance, and numChoices are each float arrays
e.g.
accuracy=np.array([.6,.7,.8])
tranChance=np.array([.6,.7,8])
numChoices=np.array([2,.3,4])
how would I run and plot plot_ev over my 3 arrays so that I end up with an output that has all combinations of elements (ideally not running 3 forloops)
ideally i would have a single plot showing the output of all combinations (1st element from accuracy with all elements from transChance and numChoices, 2nd element from accuracy with all elements from transChance and numChoices and so on )
thanks in advance!

Use numpy.meshgrid to make an array of all the combinations of values of the three variables.
products = np.array(np.meshgrid(accuracy, tranChance, numChoices)).T.reshape(-1, 3)
Then transpose this again and extract three longer arrays with the values of the three variables in every combination:
accuracy_, tranChance_, numChoices_ = products.T
Your function contains only operations that can be carried out on numpy arrays, so you can then simply feed these arrays as parameters into the function:
reward = ?? # you need to set the reward value
results = plot_ev(accuracy_, tranChance_, numChoices_, reward)
Alternatively consider using a pandas dataframe which will provide clearer labeling of the columns.
import pandas as pd
df = pd.DataFrame(products, columns=["accuracy", "tranChance", "numChoices"])
df["ev"] = plot_ev(df["accuracy"], df["tranChance"], df["numChoices"], reward)

Related

Enumerating through a list of data to find averages, but the lines aren't just numbers

I am new to Python. I am enumerating through a large list of data, as shown below, and would like to find the mean of every line.
for index, line in enumerate (data):
#calculate the mean
However, the lines of this particular set of data are as such:
[array([[2.3325655e-10, 2.4973504e-10],
[1.3025138e-10, 1.3025231e-10]], dtype=float32)].
I would like to find the mean of both 2x1s separately, then the average of both means, so it outputs a single number.
Thanks in advance.
You probably do not need to enumerate through the list to achieve what you want. You can do it in two steps using list comprehension.
For example,
data = [[2.3325655e-10, 2.4973504e-10],
[1.3025138e-10, 1.3025231e-10]]
# Calculate the average for 2X1s or each row
avgs_along_x = [sum(line)/len(line) for line in data]
# Calculate the average along y
avg_along_y = sum(avgs_along_x)/len(avgs_along_x)
There are other ways to calculate the mean of a list in python. You can read about them here.
If you are using numpy this can be done in one line.
import numpy as np
np.average(data, 1) # calculate the mean along x-axes denoted as 1
# To get what you want, we can pass tuples of axes.
np.average(data, (1,0))

Calculating intermittent average

I have a huge dataframe with a lot of zero values. And, I want to calculate the average of the numbers between the zero values. To make it simple, the data shows for example 10 consecutive values then it renders zeros then values again. I just want to tell python to calculate the average of each patch of the data.
The pic shows an example
first of all I'm a little bit confused why you are using a DataFrame. This is more likely being stored in a pd.Series while I would suggest storing numeric data in an numpy array. Assuming that you are having a pd.Series in front of you and you are trying to calculate the moving average between two consecutive points, there are two approaches you can follow.
zero-paddding for the last integer:
assuming circularity and taking the average between the first and the last value
Here is the expected code:
import numpy as np
import pandas as pd
data_series = pd.Series([0,0,0.76231, 0.77669,0,0,0,0,0,0,0,0,0.66772, 1.37964, 2.11833, 2.29178, 0,0,0,0,0])
np_array = np.array(data_series)
#assuming zero_padding
np_array_zero_pad = np.hstack((np_array, 0))
mvavrg_zeropad = [np.mean([np_array_zero_pad[i], np_array_zero_pad[i+1]]) for i in range(len(np_array_zero_pad)-1)]
#asssuming circularity
np_array_circ_arr = np.hstack((np_array, np_array[-1]))
np_array_circ_arr = [np.mean([np_array_circ_arr[i], np_array_circ_arr[i+1]]) for i in range(len(np_array_circ_arr)-1)]

python average of multidimensional array netcdf plot

I read a multidimensional array from netCDF file.
The variable that I need to plot is named "em", and it has 4 dimensions ´em (years, group, lat, lon)´
The "group" variable has 2 values, I am interested only of the first one.
So the only variable that I need to manage is the "years" variable. The variable "years" has 17 values. For the first plot I need to average the first 5 years, and for the second plot I have to aveage from 6th years to the last years.
data = Dataset (r'D:\\Users\\file.nc')
lat = data.variables['lat'][:]
lon = data.variables['lon'][:]
year = data.variables['label'][:]
group = data.variables['group'][:]
em= data.variables['em'][:]
How can I create a 2 dimensional array avareging for this array ?
First one :
`em= data.variables['em'][0:4][0][:][:]`
Second one :
em= data.variables['em'][5:16][0][:][:]
I create a simple loop
nyear=(2005-2000)+1
for i in range (nyear):
em_1= data.variables['em'][i][0][:][:]
em_1+=em_1
em_2000_2005=em_1/nyear
but I think there could be more elegant easy way to this on python
I would highly recommend using xarray for working with NetCDF files. Rather than keeping track of indices positionally, you can operate on them by name which greatly improves code readability. In your example all you would need to do is
import xarray as xr
ds = xr.open_dataset(r'D:\\Users\\file.nc')
em_mean1 = ds.em.isel(label = range(6,18)).mean()
em_mean2 = ds.em.isel(label = range(6)).mean()
the .isel method selects the indices of the specified dimension (label in this case), and the .mean() method computes the average over the selection.
You can use NumPy:
em = data.variables['em'][:];
em_mean = np.mean(em,axis=0) # if you want to average all over first dimension
If data contains NaN's, just use NumPY's nanmean.
As you wanted to average first 3 values, for the first case, use:
em_mean1 = np.squeeze(np.mean(em[0:4,:],axis=0))
and take for the plot:
em_mean1 = np.squeeze(em_mean1[0,:]);
You can do similar for the second case.

how to multiply two matrices using numpy in iterative manner. for given range in python

I want to calculate a resultant State Matrix by multiplying initial state matrix and transition matrix for given amount of time.
For example if period is 1 month, then State1 [matrix] will be State[]*Transition[]
If period is 2 then State2[] = State1[]*Transition
3 then State3[]=State2[]* Transition
...and so on
I'm having a problem to iterate the values of resultant matrix using loops:
I don't know how to iterate values via multiplication in python.
Here's my code:
import numpy as np
statevector=np.array([0.2,0.8])
transition=np.array([[0.9,0.1],[0.7,0.3]])
for product in range(0,1):
product=statevector
product=np.dot(statevector,transition)
product=product+1
r=np.dot(product,transition)
print(r)
If I understand you correctly, you want to repeatedly multiply the statevector with the transition matrix. One way to do this is in a for loop like this:
import numpy as np
statevector=np.array([0.2,0.8])
transition=np.array([[0.9,0.1],[0.7,0.3]])
states = [statevector]
for i in range(10):
statevector=np.dot(statevector,transition)
states.append(statevector)
print(states)
Every iteration I'm adding the new state to the list states. The end result is:
[array([0.2, 0.8]), array([0.74, 0.26]), array([0.848, 0.152]), array([0.8696, 0.1304]), array([0.87392, 0.12608]), array([0.874784, 0.125216]), array([0.8749568, 0.1250432]), array([0.87499136, 0.12500864]), array([0.87499827, 0.12500173]), array([0.87499965, 0.12500035]), array([0.87499993, 0.12500007])]

calculating means of many matrices in numpy

I have many csv files which each contain roughly identical matrices. Each matrix is 11 columns by either 5 or 6 rows. The columns are variables and the rows are test conditions. Some of the matrices do not contain data for the last test condition, which is why there are 5 rows in some matrices and six rows in other matrices.
My application is in python 2.6 using numpy and sciepy.
My question is this:
How can I most efficiently create a summary matrix that contains the means of each cell across all of the identical matrices?
The summary matrix would have the same structure as all of the other matrices, except that the value in each cell in the summary matrix would be the mean of the values stored in the identical cell across all of the other matrices. If one matrix does not contain data for the last test condition, I want to make sure that its contents are not treated as zeros when the averaging is done. In other words, I want the means of all the non-zero values.
Can anyone show me a brief, flexible way of organizing this code so that it does everything I want to do with as little code as possible and also remain as flexible as possible in case I want to re-use this later with other data structures?
I know how to pull all the csv files in and how to write output. I just don't know the most efficient way to structure flow of data in the script, including whether to use python arrays or numpy arrays, and how to structure the operations, etc.
I have tried coding this in a number of different ways, but they all seem to be rather code intensive and inflexible if I later want to use this code for other data structures.
You could use masked arrays. Say N is the number of csv files. You can store all your data in a masked array A, of shape (N,11,6).
from numpy import *
A = ma.zeros((N,11,6))
A.mask = zeros_like(A) # fills the mask with zeros: nothing is masked
A.mask = (A.data == 0) # another way of masking: mask all data equal to zero
A.mask[0,0,0] = True # mask a value
A[1,2,3] = 12. # fill a value: like an usual array
Then, the mean values along first axis, and taking into account masked values, are given by:
mean(A, axis=0) # the returned shape is (11,6)

Categories