I have an array of shape(201,201), I would like to plot some cross sections through the data, but I am having trouble accessing the relevant points. For example say I want to plot the cross section given by the line in the figure produced by,
from pylab import *
Z = randn(201,201)
x = linspace(-1,1,201)
X,Y = meshgrid(x,x)
pcolormesh(X,Y,Z)
plot(x,x*.5)
I'd like to plot these at various orientations but they will always pass through the origin if that helps...
Basically, you want to interpolate a 2D grid along a line (or an arbitrary path).
First off, you should decide if you want to interpolate the grid or just do nearest-neighbor sampling. If you'd like to do the latter, you can just use indexing.
If you'd like to interpolate, have a look at scipy.ndimage.map_coordinates. It's a bit hard to wrap your head around at first, but it's perfect for this. (It's much, much more efficient than using an interpolation routine that assumes that the data points are randomly distributed.)
I'll give an example of both. These are adapted from an answer I gave to another question. However, in those examples, everything is plotted in "pixel" (i.e. row, column) coordinates.
In your case, you're working in a different coordinate system than the "pixel" coordinates, so you'll need to convert from "world" (i.e. x, y) coordinates to "pixel" coordinates for the interpolation.
First off, here's an example of using cubic interpolation with map_coordinates:
import numpy as np
import scipy.ndimage
import matplotlib.pyplot as plt
# Generate some data...
x, y = np.mgrid[-5:5:0.1, -5:5:0.1]
z = np.sqrt(x**2 + y**2) + np.sin(x**2 + y**2)
# Coordinates of the line we'd like to sample along
line = [(-3, -1), (4, 3)]
# Convert the line to pixel/index coordinates
x_world, y_world = np.array(zip(*line))
col = z.shape[1] * (x_world - x.min()) / x.ptp()
row = z.shape[0] * (y_world - y.min()) / y.ptp()
# Interpolate the line at "num" points...
num = 1000
row, col = [np.linspace(item[0], item[1], num) for item in [row, col]]
# Extract the values along the line, using cubic interpolation
zi = scipy.ndimage.map_coordinates(z, np.vstack((row, col)))
# Plot...
fig, axes = plt.subplots(nrows=2)
axes[0].pcolormesh(x, y, z)
axes[0].plot(x_world, y_world, 'ro-')
axes[0].axis('image')
axes[1].plot(zi)
plt.show()
Alternately, we could use nearest-neighbor interpolation. One way to do this would be to pass order=0 to map_coordinates in the example above. Instead, I'll use indexing just to show another approach. If we just change the line
# Extract the values along the line, using cubic interpolation
zi = scipy.ndimage.map_coordinates(z, np.vstack((row, col)))
To:
# Extract the values along the line, using nearest-neighbor interpolation
zi = z[row.astype(int), col.astype(int)]
We'll get:
Related
I am plotting a 1d array (x-axis) against a 2d array (y-axis)in matplotlib so there are multiple y values for each x value. I want to plot a straigt line of best fit (linear regression), not just a line joining the points. How can I do this???
All the otehr examples seem to only have one y value per x value. When I use 'from sklearn.linear_model import LinearRegression' I get as many best fit lines as there are y values per x value.
EDIT: here is the code I have tried:
model = LinearRegression()
x_axis2 = np.arange(0,len(av_rsq3))
x_axis2 = x_axis2.reshape(-1,1)
model.fit(x_axis2, av_rsq3)
pt.figure()
pt.plot(x_axis2,av_rsq3, 'rx')
pt.plot(x_axis2, model.predict(x_axis2))
note: x_axis2 is a 1d array and av_rsq3 is a 2d array.
You just need to add these points with matching x-values as normal points, then you can add a line of best fit as follows:
import numpy as np
from numpy.polynomial.polynomial import polyfit
import matplotlib.pyplot as plt
x = np.array([1,2,3,4,5,6,6,6,7,7,8])
y = np.array([1,2,4,8,16,32,34,30,61,65,120])
# Fit with polyfit
b, m = polyfit(x, y, 1)
plt.plot(x, y, '.')
plt.plot(x, b + m * x, '-')
plt.show()
which produces .
Note, a straight line doesn't fit my example data, but I didn't think about that when writing it :) With polyfit you are also able to change the degree of the fit, as well as obtain error margins in gradients* and offsets.
* (or other polynomial coefficients)
What you need to do is provide a one to one mapping. The order the points appear in does not matter. So if you have something like this
X: [1,2,3,4]
Y1: [4,6,2,7]
Y2: [2,3,6,8]
you would get this
X: [1,2,3,4,1,2,3,4]
Y: [4,6,2,7,2,3,6,8]
If you just want to plot the y values and a line averaging between them, this is possible. Borrowing the dummy data from another answer:
x = [1,2,3,4]
y = [4,6,2,7]
y1 = [2,3,6,8]
plt.scatter(x,y)
plt.scatter(x,y1)
plt.plot(x,[((y[i]+y1[i])/2) for i in range(len(y))])
I'm trying to draw the best fitting line for given (x,y) data points.
Here shows data points (red pixels) and estimated line (green), I obtained using following library.
import numpy as np
m, c = np.linalg.lstsq(A, y)[0]
Documentation for used library module
We can see data points are roughly symmetrically distributed. Problem is why is this line not having the gradient similar to the long symmetric axis through the data points? Can you please explain can this result is correct? Then, how it gives minimum error? (Line is drawn correctly using gradient returned by the lstsq method). Thank you.
EDIT
Here is the code I'm trying. Input image can be downloaded from here. In this code I've not forced the line to pass through the center of the pixel distribution. (Note: here I've used polyfit instead of lstsq. Both gives same results)
import numpy as np
import cv2
import math
img = cv2.imread('points.jpg',1);
h, w = img.shape[:2]
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
points = np.argwhere(gray>10) # get (x,y) pairs where red pixels exist
y = points[:,0]
x = points[:,1]
m, c = np.polyfit(x, y, 1) # calculate least square fit line
# calculate two cordinates (x1,y1),(x2,y2) on the line
angle = np.arctan(m)
x1, y1, length = 0, int(c), 500
x2 = int(round(math.ceil(x1 + length * np.cos(angle)),0))
y2 = int(round(math.ceil(y1 + length * np.sin(angle)),0))
# draw line on the color image
cv2.line(img, (x1, y1), (x2, y2), (0,255,0), 1, cv2.LINE_8)
# show output the image
cv2.namedWindow("Display window", cv2.WINDOW_AUTOSIZE);
cv2.imshow("Display window", img);
cv2.waitKey(0);
cv2.destroyAllWindows()
How can I have the line pass through the longest symmetric axis of the pixel distribution? Can I use principle component analysis?
It's hard to say why this would be the case. The bottom line is that I can't see the data you're using, and I can't see what the calculated slope and y intercept are for the data you're using.
Here are a couple of things that could explain what we're seeing:
(1) The density of data points is actually quite different than it appears to a casual glance and everything is working properly.
(2) You're sending the wrong arguments to the least squares function and you've got a GIGO situation. (I haven't used numpy's least squares algorithm, so I can't check this.)
(3) The scatter plot and the line plot don't agree on the scale of the axes.
(4) The least squares function in question is broken.
(5) You're not passing the same data to the least squares algorithm as you're passing to the plotting routine.
(6) The data formatting is funky so that the scatter plot and least squares routines are interpreting your data differently.
I can't know which of these is the problem, and unless it's (3), I expect we'd need more data to be able to distinguish between these possibilities.
Here's how I'd proceed if I were you: (1) Create a small artificial data set that sits on a line and pass it to the least squares function and see if it spits out the right numbers. See if these look right when plotted or not. (2) If this looks okay, record the output of the least squares algorithm, see if you can find another least squares program to calculate the slope and y intercept and compare them. If they're the same, it's probably not the routine, it's probably something to do with plotting.
If you get this far and it's still a mystery, let us know what you've found and maybe we can make another suggestion.
Good luck.
If the red dots truly represent your data, you are probably applying your linear regression function in a way that forces the line through the origin. How do i know? When using linear regression on two variables x and y, the line will intercept a few specific points. For example the average of x, and the average of y. Also, depending on your specifications, a calculated or specified intercept of the y axis. If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin. Not much more can be said before you provide som reproducible data and code.
EDIT:
I didn't have much luck with the reproducble sample provided, so I built an example with random numbers to elaborate on my original answer. I think statsmodels is a decent library for linear regression analysis. First, I'll address this earlier comment:
If all variables of x and y are positive, you will have a line that looks like yours if the line is forced through the origin.
You'll see an increasing effect of this the larger your numbers are (the further away from the origin your numbers are). Using sm.OLS(y,sm.add_constant(x)).fit() and sm.OLS(y,x).fit() for two different sets of numbers will show you exactly what I mean. First, I'll run a regression on the dataset below without an estimated constant (the line goes through the origin). This will give us a plot that at resembles your original plot:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
regLine_origin = x*results1.params[0]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.patch.set_facecolor('black')
plt.show()
Next, I'll include a constant in the regression. Now, the yellow line will represent what I think you were after in your question:
# Libraries
import statsmodels.api as sm
import numpy as np
import matplotlib.pyplot as plt
# Data
np.random.seed(123)
x = np.random.normal(size=2500) + 100
y = x * 2 + np.random.normal(size=2500) + 100
# Regression
results1 = sm.OLS(y,x).fit()
results2 = sm.OLS(y,sm.add_constant(x)).fit()
regLine_origin = x*results1.params[0]
regLine_constant = results2.params[0] + x*results2.params[1]
# PLot
fig, ax = plt.subplots()
ax.scatter(x, y, c='red', s=4)
ax.scatter(x, regLine_origin, c = 'green', s = 1)
ax.scatter(x, regLine_constant, c = 'yellow', s = 1)
ax.patch.set_facecolor('black')
plt.show()
And lastly, we can take a look at what happens when the numbers are closer to the origin. So to speak. Here, I'll remove the +100 part when the numbers are produced:
# The following is changed in the snippet above:
# Data
x = np.random.normal(size=2500)
y = x * 2 + np.random.normal(size=2500)
And that's why I think your original regression line is set to go through the origin. Have a look at the statsmodels package. Here you can study the details of the estimate by running print(results2.summary()):
And as you've already seen in the snippets above, you'll have direct access to the regression coefficients by using results2.params.
Edit2: My explanation still isn't 100% valid. The x and y values will have to differ a bit in size to see this effect. You'll certainly find situations where the line goes through the origin no matter the size of the numbers.
Have a look at the different x labels, and you'll see what I mean.
I want to make a streamplot in Basemap module, but I get a blank sphere. Please help me resolve this problem. I use matplotlib 1.3 and ordinary streamplot is working fine.
import matplotlib.pyplot as plt
import numpy as np
from mpl_toolkits.basemap import Basemap
map = Basemap(projection='ortho',lat_0=45,lon_0=-100,resolution='l')
# draw lat/lon grid lines every 30 degrees.
map.drawmeridians(np.arange(0,360,30))
map.drawparallels(np.arange(-90,90,30))
# prepare grids
lons = np.linspace(0, 2*np.pi, 100)
lats = np.linspace(-np.pi/2, np.pi/2, 100)
lons, lats = np.meshgrid(lons, lats)
# parameters for vector field
beta = 0.0
alpha = 1.0
u = -np.cos(lats)*(beta - alpha*np.cos(2.0*lons))
v = alpha*(1.0 - np.cos(lats)**2)*np.sin(2.0*lons)
speed = np.sqrt(u*u + v*v)
# compute native map projection coordinates of lat/lon grid.
x, y = map(lons*180./np.pi, lats*180./np.pi)
# contour data over the map.
cs = map.streamplot(x, y, u, v, latlon = True, color = speed, cmap=plt.cm.autumn, linewidth=0.5)
plt.show()
I can't exactly tell you what's wrong, but from the matplotlib.streamplot manual:
matplotlib.pyplot.streamplot(x, y, u, v, density=1, linewidth=None,
color=None, cmap=None, norm=None, arrowsize=1, arrowstyle=u'-|>',
minlength=0.1, transform=None, zorder=1, hold=None)ΒΆ
Draws streamlines of a vector flow.
x, y : 1d arrays
an evenly spaced grid.
u, v : 2d arrays
x and y-velocities. Number of rows should match length of y, and the number of columns should match x.
Additionally from matplotlib.basemap.streamplot you can read that
If latlon keyword is set to True, x,y are intrepreted as longitude and latitude in degrees.
Which corresponds to the fact that x and y should be 1D arrays (lat, lon). However in your example x and y are
>>> np.shape(x)
(100, 100)
>>> np.shape(y)
(100, 100)
Then again you call the method map() "to compute native map projection coordinates of lat/lon grid" which is coincidentally the same as the name of your basemap.map. So it depends on which one do you want? Because both will return a value! (or better to say, both will return an error)
Aditionally check out the values you have in your u array. They are of range e-17. While other values are easily in the range e+30. IIRC the way you get streamlines is by solving a differential equation in which points you sent it as values are used as parameters at coordinates you sent. It's not hard to imagine a that while calculating something with these numbers a floating point round-off occurs and you suddenly start getting NaN or 0 values.
Try to scale your example better or if you want to pursue the solution to the end you can try and use np.seterr to get a more detailed idea where it fails.
Sorry I couldn't have been of a bigger help.
I have some data over a 2D range that I am interested in analyzing. These data were originally in lists x,y, and z where z[i] was the value for the point located at (x[i],y[i]). I then interpolated this data onto a regular grid using
x=np.array(x)
y=np.array(y)
z=np.array(z)
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
zi=griddata(x,y,z,xi,yi)
I then plotted the xi,yi,zi data using
plt.contour(xi,yi,zi)
plt.pcolormesh(xi,yi,zi,cmap=plt.get_cmap('PRGn'),norm=plt.Normalize(-10,10),vmin=-10,vmax=10)
This produced this plot:
In this plot you can see the S-like curve where the values are equal to zero (aside: the data doesn't vary as rapidly as shown in the colorbar -- that's simply a result of me normalizing the data to -10-10 when it actually extends far beyond that range; I did this to make the zero-valued region show up better -- maybe there's a better way of doing this too...).
The scattered dots are simply the points at which I have original data (yes, in this case my data was already on a regular grid). What I'm curious about is whether there is a good way for me to extract the values for which the curve is zero and obtain x,y pairs that, if plotted as a line, would trace that zero-region in the colormesh. I could interpolate to a really fine grid and then just brute force search for the values which are closest to zero. But is there a more automatic way of doing this, or a more automatic way of plotting this "zero-line"?
And a secondary question: I am using griddata correctly, right? I have these simple 1D arrays although elsewhere people use various meshgrids, loading texts, etc., before calling griddata.
Here is a full example:
import numpy as np
import matplotlib.pyplot as plt
y, x = np.ogrid[-1.5:1.5:200j, -1.5:1.5:200j]
f = (x**2 + y**2)**4 - (x**2 - y**2)**2
plt.figure(figsize=(9,4))
plt.subplot(121)
extent = [np.min(x), np.max(x), np.min(y), np.max(y)]
cs = plt.contour(f, extent=extent, levels=[0.1],
colors=["b", "r"], linestyles=["solid", "dashed"], linewidths=[2, 2])
plt.subplot(122)
# get the points on the lines
for c in cs.collections:
data = c.get_paths()[0].vertices
plt.plot(data[:,0], data[:,1],
color=c.get_color()[0], linewidth=c.get_linewidth()[0])
plt.show()
here is the output:
Are there any algorithms that will return the equation of a straight line from a set of 3D data points? I can find plenty of sources which will give the equation of a line from 2D data sets, but none in 3D.
Thanks.
If you are trying to predict one value from the other two, then you should use lstsq with the a argument as your independent variables (plus a column of 1's to estimate an intercept) and b as your dependent variable.
If, on the other hand, you just want to get the best fitting line to the data, i.e. the line which, if you projected the data onto it, would minimize the squared distance between the real point and its projection, then what you want is the first principal component.
One way to define it is the line whose direction vector is the eigenvector of the covariance matrix corresponding to the largest eigenvalue, that passes through the mean of your data. That said, eig(cov(data)) is a really bad way to calculate it, since it does a lot of needless computation and copying and is potentially less accurate than using svd. See below:
import numpy as np
# Generate some data that lies along a line
x = np.mgrid[-2:5:120j]
y = np.mgrid[1:9:120j]
z = np.mgrid[-5:3:120j]
data = np.concatenate((x[:, np.newaxis],
y[:, np.newaxis],
z[:, np.newaxis]),
axis=1)
# Perturb with some Gaussian noise
data += np.random.normal(size=data.shape) * 0.4
# Calculate the mean of the points, i.e. the 'center' of the cloud
datamean = data.mean(axis=0)
# Do an SVD on the mean-centered data.
uu, dd, vv = np.linalg.svd(data - datamean)
# Now vv[0] contains the first principal component, i.e. the direction
# vector of the 'best fit' line in the least squares sense.
# Now generate some points along this best fit line, for plotting.
# I use -7, 7 since the spread of the data is roughly 14
# and we want it to have mean 0 (like the points we did
# the svd on). Also, it's a straight line, so we only need 2 points.
linepts = vv[0] * np.mgrid[-7:7:2j][:, np.newaxis]
# shift by the mean to get the line in the right place
linepts += datamean
# Verify that everything looks right.
import matplotlib.pyplot as plt
import mpl_toolkits.mplot3d as m3d
ax = m3d.Axes3D(plt.figure())
ax.scatter3D(*data.T)
ax.plot3D(*linepts.T)
plt.show()
Here's what it looks like:
If your data is fairly well behaved then it should be sufficient to find the least squares sum of the component distances. Then you can find the linear regression with z independent of x and then again independent of y.
Following the documentation example:
import numpy as np
pts = np.add.accumulate(np.random.random((10,3)))
x,y,z = pts.T
# this will find the slope and x-intercept of a plane
# parallel to the y-axis that best fits the data
A_xz = np.vstack((x, np.ones(len(x)))).T
m_xz, c_xz = np.linalg.lstsq(A_xz, z)[0]
# again for a plane parallel to the x-axis
A_yz = np.vstack((y, np.ones(len(y)))).T
m_yz, c_yz = np.linalg.lstsq(A_yz, z)[0]
# the intersection of those two planes and
# the function for the line would be:
# z = m_yz * y + c_yz
# z = m_xz * x + c_xz
# or:
def lin(z):
x = (z - c_xz)/m_xz
y = (z - c_yz)/m_yz
return x,y
#verifying:
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.pyplot as plt
fig = plt.figure()
ax = Axes3D(fig)
zz = np.linspace(0,5)
xx,yy = lin(zz)
ax.scatter(x, y, z)
ax.plot(xx,yy,zz)
plt.savefig('test.png')
plt.show()
If you want to minimize the actual orthogonal distances from the line (orthogonal to the line) to the points in 3-space (which I'm not sure is even referred to as linear regression). Then I would build a function that computes the RSS and use a scipy.optimize minimization function to solve it.