Confused by output from Polyval - python

I have some data from which I have fitted a 2nd order polynomial using numpy.polynomial.polynomial.polyfit
data_fit = poly.polyfit(length_spline_a, temp_spline_b, 2)
I am examining changes in length, and have a list of 10% changes in length
len_steps = 0.0, -0.012573565669572757, -0.025147131339145513, -0.03772069700871827...
print (len(len_steps)
>>>>11
My assumption was that polyval would solve for y for each of the x values in the len_steps list
y_data = poly.polyval(data_fit, len_steps)
However this provides a list with only 3 data points rather than the 11 I expected.
print(y_data)
>>>>[-5.34112443e+21 -2.50395581e+28 -6.75169134e+28]
Have I mis understood the purpose of polyval or have I done something wrong?

The output works if I reverse the list order to y_data = poly.polyval(len_steps, data_fit) which is actually clear now I re-read the documents. I think I was using the np.polyval syntax which expects the coefs first.

Related

Incorrect results for simple 2D transformation

I'm attempting a 2D transformation using the nudged package.
The code is really simple:
import nudged
# Domain data
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
x, y = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom, ran = (x_d, y_d), (x_r, y_r)
# Obtain transformation dom --> ran
trans = nudged.estimate(dom, ran)
# Apply the transformation to the (x, y) points
x_t, y_t = trans.transform((x, y))
where (x_d, y_d) and (x_r, y_r) are the 1 to 1 correlated "domain" and "range" points, and (x, y) are all the points in the (x_d, y_d) (domain) system that I want to transform to the (x_r, y_r) (range) system.
This is the result I get:
where:
trans.get_matrix()
[[-0.0006459232439068067, -0.0007947429558548157, 6.534164085946009], [0.0007947429558548157, -0.0006459232439068067, 2.515279819707991], [0, 0, 1]]
trans.get_rotation()
2.2532603497070713
trans.get_scale()
0.0010241255796531702
trans.get_translation()
[6.534164085946009, 2.515279819707991]
This is the final transformed dom values with the original ran points overlayed:
This is clearly not right and I can't figure out what I'm doing wrong.
I was able to figure out your issue. It is simply that nudge has somewhat problematic notation, which is poorly documented.
The estimate function accepts a list of coordinate pairs. You effectively have to transpose dom and ran to get this to work. I suggest either switching to numpy arrays, or using list(map(list, zip(...))) to do the transpose.
The Transform.transfom method is extremely restrictive, and requires that the inner pairs be of type list. Not tuple, not any other sequence, but specifically list. Your attempt to call trans.transform((x, y)) only happened to work by pure luck. transform assessed that the first element is not a list, and attempted to transform (x, y) as a pair of integers. Luckily for you, numpy operators are vectorized, so you can process an entire array as a single unit.
Here is a working version of your code that generates the correct plots using mostly python:
x_d = [2538.87, 1294.42, 3002.49, 2591.56, 2881.37, 891.906, 1041.24, 2740.13, 1928.55, 3335.12, 3771.76, 1655.0, 696.772, 583.242, 2313.95, 2422.2]
y_d = [2501.89, 4072.37, 2732.65, 2897.21, 808.969, 1760.97, 992.531, 1647.57, 2407.18, 2868.68, 724.832, 1938.11, 1487.66, 1219.14, 672.898, 145.059]
# Range data
x_r = [3.86551776277075, 3.69693290266126, 3.929110096606081, 3.8731112887391532, 3.9115924127798536, 3.6388068074815862, 3.6590261077461577, 3.892482104449016, 3.781816183438835, 3.97464058821231, 4.033173444601999, 3.743901522907265, 3.6117470568340906, 3.5959585708147728, 3.8338853650390945, 3.8487836817639334]
y_r = [1.6816478101135388, 1.8732008327428353, 1.7089144628920678, 1.729386055302033, 1.4767657611559102, 1.5933812675900505, 1.5003232598807479, 1.5781629182153942, 1.670867507106891, 1.7248363641300841, 1.4654588884234485, 1.6143557610354264, 1.5603626129237362, 1.5278835570641824, 1.4609066190929916, 1.397111300807424]
# Random domain data
uni = np.random.uniform(0., 4000., (2, 1000))
# Define domain and range points
dom = list(map(list, zip(x_d, y_d)))
ran = list(map(list, zip(x_r, y_r)))
# Obtain transformation dom --> ran
trans = estimate(dom, ran)
# Apply the transformation to the (x, y) points
tra = trans.transform(uni)
fig, ax = plt.subplots(2, 2)
ax[0][0].scatter(x_d, y_d)
ax[0][0].set_title('dom')
ax[0][1].scatter(x_r, y_r)
ax[0][1].set_title('ran')
ax[1][0].scatter(*uni)
ax[1][1].scatter(*tra)
I left in your hack with uni, since I did not feel like converting the array of random values to a nested list. The resulting plot looks like this:
My overall recommendation is to submit a number of bug reports to the nudge library based on these findings.

Which dtype would be correct to prevent numpy.arange() from getting the wrong length?

I am trying to get a shifting array containing 200 values in a range with a difference of 40.
Therefore i am using numpy.arange(a, b, 0.2) with starting values a=0 and b=40 and going upwards (a=0.2 b=40.2, a=0.4 b=40.4 and so on).
When I reach numpy.arange(25.4, 65.4, 0.2) however I suddenly get an array with a length 201 values:
x = numpy.arange(25.2, 65.2, 0.2)
print(len(x))
Returns 200
x = numpy.arange(25.4, 65.4, 0.2)
print(len(x))
Returns 201
I got so far to notice that this happens probably due to rounding issues because of the data type...
I know there is a option 'dtype' in numpy.arrange():
numpy.arange(star, stop, step, dtype)
The question is which data type would fit this problem and why? (I am not so confident with data types jet and https://docs.scipy.org/doc/numpy/reference/generated/numpy.dtype.html#numpy.dtype hasn't helped me to get this issue resolved. Please help!
np.arange is most useful when you want to precisely control the difference between adjacent elements. np.linspace, on the other hand, gives you precise control over the total number of elements. It sounds like you want to use np.linspace instead:
import numpy as np
offset = 25.4
x = np.linspace(offset, offset + 40, 200)
print(x)
print(len(x))
Here's the documentation page for np.linspace: https://docs.scipy.org/doc/numpy/reference/generated/numpy.linspace.html

`ValueError: A value in x_new is above the interpolation range.` - what other reasons than not ascending values?

I receive this error in scipy interp1d function. Normally, this error would be generated if the x was not monotonically increasing.
import scipy.interpolate as spi
def refine(coarsex,coarsey,step):
finex = np.arange(min(coarsex),max(coarsex)+step,step)
intfunc = spi.interp1d(coarsex, coarsey,axis=0)
finey = intfunc(finex)
return finex, finey
for num, tfile in enumerate(files):
tfile = tfile.dropna(how='any')
x = np.array(tfile['col1'])
y = np.array(tfile['col2'])
finex, finey = refine(x,y,0.01)
The code is correct, because it successfully worked on 6 data files and threw the error for the 7th. So there must be something wrong with the data. But as far as I can tell, the data increase all the way down.
I am sorry for not providing an example, because I am not able to reproduce the error on an example.
There are two things that could help me:
Some brainstorming - if the data are indeed monotonically
increasing, what else could produce this error? Another hint,
regarding the decimals, could be in this question, but I think
my solution (the min and max of x) is robust enough to avoid it. Or
isn't it?
Is it possible (how?) to return the value of x_new and
it's index when throwing the ValueError: A value in x_new is above the interpolation range. so that I could actually see where in the
file is the problem?
UPDATE
So the problem is that, for some reason, max(finex) is larger than max(coarsex) (one is .x39 and the other is .x4). I hoped rounding the original values to 2 significant digits would solve the problem, but it didn't, it displays fewer digits but still counts with the undisplayed. What can I do about it?
If you are running Scipy v. 0.17.0 or newer, then you can pass fill_value='extrapolate' to spi.interp1d, and it will extrapolate to accomadate these values of your's that lie outside the interpolation range. So define your interpolation function like so:
intfunc = spi.interp1d(coarsex, coarsey,axis=0, fill_value="extrapolate")
Be forewarned, however!
Depending on what your data looks like and the type on interpolation you are performing, the extrapolated values can be erroneous. This is especially true if you have noisy or non-monotonic data. In your case you might be ok because your x_new value is only slighly beyond your interpolation range.
Here's simple demonstration of how this feature can work nicely but also give erroneous results.
import scipy.interpolate as spi
import numpy as np
x = np.linspace(0,1,100)
y = x + np.random.randint(-1,1,100)/100
x_new = np.linspace(0,1.1,100)
intfunc = spi.interp1d(x,y,fill_value="extrapolate")
y_interp = intfunc(x_new)
import matplotlib.pyplot as plt
plt.plot(x_new,y_interp,'r', label='interp/extrap')
plt.plot(x,y, 'b--', label='data')
plt.legend()
plt.show()
So the interpolated portion (in red) worked well, but the extrapolated portion clearly fails to follow the otherwise linear trend in this data because of the noise. So have some understanding of your data and proceed with caution.
A quick test of your finex calc shows that it can (always?) gets into the extrapolation region.
In [124]: coarsex=np.random.rand(100)
In [125]: max(coarsex)
Out[125]: 0.97393109991816473
In [126]: step=.01;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max(
...: finex),max(coarsex))
Out[126]: (0.98273730602114795, 0.97393109991816473)
In [127]: step=.001;finex=np.arange(min(coarsex), max(coarsex)+step, step);(max
...: (finex),max(coarsex))
Out[127]: (0.97473730602114794, 0.97393109991816473)
Again it is a quick test, and may be missing some critical step or value.

Moving average of an array in Python

I have an array where discreet sinewave values are recorded and stored. I want to find the max and min of the waveform. Since the sinewave data is recorded voltages using a DAQ, there will be some noise, so I want to do a weighted average. Assuming self.yArray contains my sinewave values, here is my code so far:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
for y in range (0,filtersize):
summation = sum(self.yArray[x+y])
ave = summation/filtersize
filterarray.append(ave)
My issue seems to be in the second for loop, where depending on my averaging window size (filtersize), I want to sum up the values in the window to take the average of them. I receive an error saying:
summation = sum(self.yArray[x+y])
TypeError: 'float' object is not iterable
I am an EE with very little experience in programming, so any help would be greatly appreciated!
The other answers correctly describe your error, but this type of problem really calls out for using numpy. Numpy will run faster, be more memory efficient, and is more expressive and convenient for this type of problem. Here's an example:
import numpy as np
import matplotlib.pyplot as plt
# make a sine wave with noise
times = np.arange(0, 10*np.pi, .01)
noise = .1*np.random.ranf(len(times))
wfm = np.sin(times) + noise
# smoothing it with a running average in one line using a convolution
# using a convolution, you could also easily smooth with other filters
# like a Gaussian, etc.
n_ave = 20
smoothed = np.convolve(wfm, np.ones(n_ave)/n_ave, mode='same')
plt.plot(times, wfm, times, -.5+smoothed)
plt.show()
If you don't want to use numpy, it should also be noted that there's a logical error in your program that results in the TypeError. The problem is that in the line
summation = sum(self.yArray[x+y])
you're using sum within the loop where your also calculating the sum. So either you need to use sum without the loop, or loop through the array and add up all the elements, but not both (and it's doing both, ie, applying sum to the indexed array element, that leads to the error in the first place). That is, here are two solutions:
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = sum(self.yArray[x:x+filtersize]) # sum over section of array
ave = summation/filtersize
filterarray.append(ave)
or
filterarray = []
filtersize = 2
length = len(self.yArray)
for x in range (0, length-(filtersize+1)):
summation = 0.
for y in range (0,filtersize):
summation = self.yArray[x+y]
ave = summation/filtersize
filterarray.append(ave)
self.yArray[x+y] is returning a single item out of the self.yArray list. If you are trying to get a subset of the yArray, you can use the slice operator instead:
summation = sum(self.yArray[x:y])
to return an iterable that the sum builtin can use.
A bit more information about python slices can be found here (scroll down to the "Sequences" section): http://docs.python.org/2/reference/datamodel.html#the-standard-type-hierarchy
You could use numpy, like:
import numpy
filtersize = 2
ysums = numpy.cumsum(numpy.array(self.yArray, dtype=float))
ylags = numpy.roll(ysums, filtersize)
ylags[0:filtersize] = 0.0
moving_avg = (ysums - ylags) / filtersize
Your original code attempts to call sum on the float value stored at yArray[x+y], where x+y is evaluating to some integer representing the index of that float value.
Try:
summation = sum(self.yArray[x:y])
Indeed numpy is the way to go. One of the nice features of python is list comprehensions, allowing you to do away with the typical nested for loop constructs. Here goes an example, for your particular problem...
import numpy as np
step=2
res=[np.sum(myarr[i:i+step],dtype=np.float)/step for i in range(len(myarr)-step+1)]

Nx3 column data to 2d matrix for image processing

I am trying to find local maxima and countours in a Nx3 data in format ('x','y','value') i read from a text file; 'x' and 'y' form an evenly spaced grid and there is single value for every combination of 'x','y', it looks like this:
3.0, -0.4, 56.94369888305664
3.0, -0.3, 56.97200012207031
3.0, -0.2, 56.77149963378906
3.0, -0.1, 56.41230010986328
3.0, 0, 55.8302001953125
3.0, 0.1, 55.81560134887695
3.0, 0.2, 55.600399017333984
3.0, 0.3, 55.51969909667969
3.0, 0.4, 55.18550109863281
3.2, -0.4, 56.26380157470703
3.2, -0.3, 56.228599548339844
...
The problem is that the image code I am trying to use(link) requires the data to be in a different 2d matrix format for image processing. This is the relevant part of the code:
# Construct some test data
x, y = np.ogrid[-np.pi:np.pi:100j, -np.pi:np.pi:100j]
r = np.sin(np.exp((np.sin(x)**3 + np.cos(y)**2)))
# Find contours at a constant value of 0.8
contours = measure.find_contours(r, 0.8)
Can somebody help transform my data to the required 'grided' format?
EDIT: I finally went for pandas but I find the chosen answer better in the general case.This is what I did:
from pandas import read_csv
data=read_csv(filename, names=['x','y','values']).pivot(index='x', columns='y',
values='values')
After this data.values holds the table in 2d 'image form' the like I wanted.
y -0.4 -0.3 -0.2 -0.1
x
3.0 86.9423 87.6398 87.5256 89.5779
3.2 76.9414 77.7743 78.8633 76.8955
3.4 71.4146 72.8257 71.7210 71.5232
The best solution really depends on details your not giving. By the way, you should really give your code, or at least the np.loadtxt instruction.
In the following, "data" is the array loaded from the file using:
data = np.loadtxt('file.txt', [('x',float), ('y',float), ('value',float)])
1) Direct reshape:
Following on what #tom10 said
If you know that your (x,y,value) data is stored in the specific order:
[(x0,y0,v00), (x0,y1,v01), .... , (x1,y0,v10),(x1,y1,v11), ... ,(xN,yM,vNM)]
And that the values of all (x,y) pairs are given. Then the best is to make a 1D numpy array from your list of values and reshape it:
x = np.unique(data['x'])
y = np.unique(data['y'])
r = data['value'].reshape((x.size,y.size))
2) General cases:
see Populate arrays in python (numpy)? for a similar question and an other solution using dictionaries
If your cannot guaranty anything else than having (x,y,value) tuples:
# indexing: list of x and y coordinates, and functions that map them to index
x = np.unique(data['x']).tolist()
y = np.unique(data['y']).tolist()
ix = np.vectorize(lambda i: x.index(i), otypes='i')
iy = np.vectorize(lambda j: y.index(j), otypes='i')
# create output array
r = np.zeros((x.size,y.size), float) # default value is 0
r[ix(data['x']), iy(data['y'])] = data['value']
Note: In the reference given above, an other approach using dictionaries is given. I think this is more readable, but I did not test their relative speed.
3) Intermediate cases?
You might have an intermediate case, between a regular grid coordinates given in a specific order and no constraint at all. The general case being potentially very slow, you should design your algorithm to take advantage of any rule your data follow.
One example is if you know that the x-y indexing follow a specific rule, but are not necessarily given in order. For instance, if you know that the x and y are equally spaced "grid" coordinates, of the form:
coordinate = min_coordinate + i*step
Then find min_coordinate and step (for both x and y), and find i by solving this equation. This way, you avoid the costly index mapping np.vectorized(... list.index(...)):
x = np.unique(data['x'])
y = np.unique(data['y'])
ix = (data['x']-x.min())/(x[1]-x[0])
iy = (data['y']-y.min())/(y[1]-y[0])
# create output array
r = np.ones((x.size,y.size), float)*np.nan # default value is NaN
r[ix.astype(int), iy.astype(int)] = data['value']
For program you're using, you just need the data to be rectangular array of z values (in the example they give they just use x and y to construct z, but then never use them again). It looks like you have array that's 9 by N (where N is something you don't show). One easy way to get this is to just read the data in as a flat collection of z values, skipping the x,y values, reshape to set the shape you'd like. (I can't really write the code for this because you haven't given enough info, but it shouldn't be difficult.)

Categories