I have ben able to successfully utilize scipy's interpolate griddata function on multiple different datasets. However, I have now reached the point where I want to extract the same region from a large grid and interpolate over different datapoints. In other words, I can do the following:
# Get one sample of data
sample_data = alldata[0,:,:]
small_data = griddata((largelats.flatten(),largelons.flatten()),sample_data.flatten(),(smalllats,smalllons),'nearest')
Now, if I wanted to loop through this data based on the length of the number of first-index indices for the variable alldata:
final_data= np.zeros([len(alldata),smalllats.shape[0],smalllons.shape[1]])
for i in range(0,len(alldata)):
sample_data = alldata[i,:,:]
small_data = griddata((largelats.flatten(),largelons.flatten()),sample_data.flatten(),(smalllats,smalllons),'nearest')
final_data[i,:,:] = small_data
The problem with the above method is that I am calculating the indices to extract every loop with the griddata function. Is there something that can be done to return the indices so I could, instead, do something like the following:
xind = []
yind = []
final_data= np.zeros([len(alldata),smalllats.shape[0],smalllons.shape[1]])
for i in range(0,len(alldata)):
sample_data = alldata[i,:,:]
if len(xind) == 0:
small_data,return_x_inds_of_large_grid,return_y_inds_of_large_grid = griddata((largelats.flatten(),largelons.flatten()),sample_data.flatten(),(smalllats,smalllons),'nearest')
xind = return_x_inds_of_large_grid
yind = return_y_inds_of_large_grid
final_data[i,xind,yind] = sample_data[xind,yind]
Basically, I would not like to loop over the griddata function. I would like to return the indices that can then be used for other iterations. Can something like this be done?
Related
Hey all I have a set up seemingly random 2D data that I want to reorder. This is more for an image with specific values at each pixel but the concept will be the same.
I have large 2d array that looks very random, say:
x = 100
y = 120
np.random.random((x,y))
and I want to re-distribute the 2d matrix so that the maximum value is in the center and the values from the maximum surround it giving it sort of a gaussian fall off from the center.
small example:
output = [[0.0,0.5,1.0,1.0,1.0,0.5,0.0]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.5,1.0,1.5,2.0,1.5,1.0,0.5]
[0.0,1.0,1.0,1.5,1.0,0.5,0.0]
[0.0,0.5,1.0,1.0,1.0,0.5,0.0]]
I know it wont really be a gaussian but just trying to give a visualization of what I would like. I was thinking of sorting the 2d array into a list from max to min and then using that to create a new 2d array but Im not sure how to distribute the values down to fill the matrix how I want.
Thank you very much!
If anyone looks at this in the future and needs help, Here is some advice on how to do this effectively for a lot of data. Posted below is the code.
def datasort(inputarray,spot_in_x,spot_in_y):
#get the data read
center_of_y = spot_in_y
center_of_x = spot_in_x
M = len(inputarray[0])
N = len(inputarray)
l_list = list(itertools.chain(*inputarray)) #listed data
l_sorted = sorted(l_list,reverse=True) #sorted listed data
#Reorder
to_reorder = list(np.arange(0,len(l_sorted),1))
x = np.linspace(-1,1,M)
y = np.linspace(-1,1,N)
centerx = int(M/2 - center_of_x)*0.01
centery = int(N/2 - center_of_y)*0.01
[X,Y] = np.meshgrid(x,y)
R = np.sqrt((X+centerx)**2 + (Y+centery)**2)
R_list = list(itertools.chain(*R))
values = zip(R_list,to_reorder)
sortedvalues = sorted(values)
unzip = list(zip(*sortedvalues))
unzip2 = unzip[1]
l_reorder = zip(unzip2,l_sorted)
l_reorder = sorted(l_reorder)
l_unzip = list(zip(*l_reorder))
l_unzip2 = l_unzip[1]
sorted_list = np.reshape(l_unzip2,(N,M))
return(sorted_list)
This code basically takes your data and reorders it in a sorted list. Then zips it together with a list based on a circular distribution. Then using the zip and sort commands you can create the distribution of data you wish to have based on your distribution function, in my case its a circle that can be offset.
I am trying to make a data smoothng function on a set of data I am using savitzky golay filter in order to do that, I am collecting an array of data and call the function by Scipy.
But since I am looping through a spcific element in a different frame I dont have spatial locality nor time locality.
dataobj.body.data[j][0][i]
holds (x,y) and I am only collecting the ys.
Here's the following loop :
def smooth_data(dataobj):
number_of_frames = len(dataobj.body.data)
for i in range(0, 137):
arr = []
for j in range(0, number_of_frames):
arr.append(dataobj.body.data[j][0][i][1])
newdata = scipy.signal.savgol_filter(arr, 25, 3)
for k in range(0, number_of_frames):
dataobj.body.data[k][0][i][1] = newdata[k]
return dataobj
I'd like to make it work faster, right now when the number of frames is over 1000 it takes a considerable amount of time, something like 30 seconds.
Thanks alot to all of the helpers !
If the input data is a multi-dimensional numpy array, then you can pass in a slice of the numpy array to the scipy method, and then insert the resulting array back into the original data object:
def smooth_data(dataobj):
number_of_frames = len(dataobj[:,0,0,1])
number_of_records = len(dataobj[0,0,:,1])
for i in range(0, number_of_records):
newdata = scipy.signal.savgol_filter(dataobj[:,0,i,1], 3, 1)
dataobj[:][0][i][1] = newdata
return dataobj
What about training a Krige model (of just a polynomial interpolation ) with 50 % of your x and y datas, and then taking the ^y evaluation of the model on your whole set x ?
Krige model example of code (using smt module) :
from smt.surrogate_models import KRG
t= KRG(theta0=[1e-2]*ndim,print_prediction = False)
t.set_training_values(xt,yt) #training inputs, outputs
t.train()
# Prediction of the other points
y = t.predict_values(xtest)
Let's denote refVar, a variable of interest that contains experimental data.
For the simulation study, I would like to generate other variables V0.05, V0.10, V0.15 until V0.95.
Note that for the variable name, the value following V represents the correlation between the variable and refVar (in order to quick track in the final dataframe).
My readings led me to multivariate_normal() from numpy. However, when using this function, it generates 2 1D-arrays both with random numbers. What I want is to always keep refVar and generate other arrays filled with random numbers, while meeting the specified correlation.
Please, find below my my code. To cut it short, I've no clue how to generate other variables relative to my experimental variable refVar. Ideally, I would like to build a data frame containing the following columns: refVar,V0.05,V0.10,...,V0.95. I hope you get my point and thank you in advance for your time
import numpy as np
import pandas as pd
from numpy.random import multivariate_normal as mvn
refVar = [75.25,77.93,78.2,61.77,80.88,71.95,79.88,65.53,85.03,61.72,60.96,56.36,23.16,73.36,64.18,83.07,63.25,49.3,78.2,30.96]
mean_refVar = np.mean(refVar)
for r in np.arange(0,1,0.05):
var1 = 1
var2 = 1
cov = r
cov_matrix = [[var1,cov],
[cov,var2]]
data = mvn([mean_refVar,mean_refVar],cov_matrix,size=len(refVar))
output = 'corr_'+str(r.round(2))+'.txt'
df = pd.DataFrame(data,columns=['refVar','v'+str(r.round(2)])
df.to_csv(output,sep='\t',index=False) # Ideally, instead of creating an output for each correlation, I would like to generate a DF with refVar and all these newly created Series
Following this answer we can generate the sequence as follow:
def rand_with_corr(refVar, corr):
# center and normalize refVar
X = np.array(refVar) - np.mean(refVar)
X = X/np.linalg.norm(X)
# random sampling Y
Y = np.random.rand(len(X))
# centralize Y
Y = Y - Y.mean()
# find the orthorgonal component to X
Y = Y - Y.dot(X) * X
# normalize Y
Y = Y/np.linalg.norm(Y)
# output
return Y + (1/np.tan(np.arccos(corr))) * X
# test
out = rand_with_corr(refVar, 0.05)
pd.Series(out).corr(pd.Series(refVar))
# out
# 0.050000000000000086
I've been using psd() to compute power spectral density over a .wav file. I've shown it to my supervisor and he doesn't want it averaged to compute the Pxx:
The |FFT(i)|^2 of each segment are averaged to compute Pxx
He suggested I use PSD but overlap it manually a frame at a time instead of passing in the whole array of data. I've attempted it and it looks like this:
def spec_draw(imag_array):
overlap_step = len(imag_array) / 128
temp = []
values = []
for x in range(0, len(imag_array), overlap_step-overlap_step/2):
try:
for i in range(0, overlap_step):
temp.append(imag_array[x+i])
except:
pass
values.append(psd(temp, sides='onesided'))
temp = []
print values
Where imag_array is an array of data from a wave file. I've sent it to him and he doesn't understand Python very well and since he can't run it, he can't debug it. Does this look correct?
I have an array where discreet sinewave values are recorded and stored. I want to find the max and min of the waveform. Since the sinewave data is recorded voltages using a DAQ, there will be some noise, so I want to do a weighted average. Assuming self.yArray contains my sinewave values, here is my code so far:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
for y in range (0,filtersize):
summation = sum(self.yArray[x+y])
ave = summation/filtersize
filterarray.append(ave)
My issue seems to be in the second for loop, where depending on my averaging window size (filtersize), I want to sum up the values in the window to take the average of them. I receive an error saying:
summation = sum(self.yArray[x+y])
TypeError: 'float' object is not iterable
I am an EE with very little experience in programming, so any help would be greatly appreciated!
The other answers correctly describe your error, but this type of problem really calls out for using numpy. Numpy will run faster, be more memory efficient, and is more expressive and convenient for this type of problem. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
# make a sine wave with noise
times = np.arange(0, 10*np.pi, .01)
noise = .1*np.random.ranf(len(times))
wfm = np.sin(times) + noise
# smoothing it with a running average in one line using a convolution
# using a convolution, you could also easily smooth with other filters
# like a Gaussian, etc.
n_ave = 20
smoothed = np.convolve(wfm, np.ones(n_ave)/n_ave, mode='same')
plt.plot(times, wfm, times, -.5+smoothed)
plt.show()
If you don't want to use numpy, it should also be noted that there's a logical error in your program that results in the TypeError. The problem is that in the line
summation = sum(self.yArray[x+y])
you're using sum within the loop where your also calculating the sum. So either you need to use sum without the loop, or loop through the array and add up all the elements, but not both (and it's doing both, ie, applying sum to the indexed array element, that leads to the error in the first place). That is, here are two solutions:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = sum(self.yArray[x:x+filtersize]) # sum over section of array
ave = summation/filtersize
filterarray.append(ave)
or
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = 0.
for y in range (0,filtersize):
summation = self.yArray[x+y]
ave = summation/filtersize
filterarray.append(ave)
self.yArray[x+y] is returning a single item out of the self.yArray list. If you are trying to get a subset of the yArray, you can use the slice operator instead:
summation = sum(self.yArray[x:y])
to return an iterable that the sum builtin can use.
A bit more information about python slices can be found here (scroll down to the "Sequences" section): http://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy
You could use numpy, like:
import numpy
filtersize = 2
ysums = numpy.cumsum(numpy.array(self.yArray, dtype=float))
ylags = numpy.roll(ysums, filtersize)
ylags[0:filtersize] = 0.0
moving_avg = (ysums - ylags) / filtersize
Your original code attempts to call sum on the float value stored at yArray[x+y], where x+y is evaluating to some integer representing the index of that float value.
Try:
summation = sum(self.yArray[x:y])
Indeed numpy is the way to go. One of the nice features of python is list comprehensions, allowing you to do away with the typical nested for loop constructs. Here goes an example, for your particular problem...
import numpy as np
step=2
res=[np.sum(myarr[i:i+step],dtype=np.float)/step for i in range(len(myarr)-step+1)]