for i=1:n
centersX(:,i)=linspace(min(xData)+dX/2,max(xData)-dX/2,nbins)';
centersY(:,i)=linspace(min(yData)+dY/2,max(phase)-dY/2,nbins)';
centers = {centersX(:,i),centersY(:,i)};
H(:,:,i) = hist3([xData yData],centers);
end
In each iteration, I construct centersX and centersY with linspace function. I then store them in a 2x1 cell array called centers. H is a nbins X nbins X n struct. In each iteration I fill a nbins X nbins slice of H with the data from hist3.
I'm looking for the Python equivalent. I'm having trouble with passing the arguments for numpy.histogram2d:
H[:,:,i] = numpy.histogram2d(xData,yData,centers)
I get the following error:
Traceback (most recent call last):
line 714, in histogramdd
N, D = sample.shape
AttributeError: 'list' object has no attribute 'shape'
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
line 36, in <module>
H[:,:,i] = numpy.histogram2d(xData, yData, centers)
line 714, in histogram2d
hist, edges = histogramdd([x, y], bins, range, normed, weights)
line 718, in histogramdd
N, D = sample.shape
ValueError: too many values to unpack (expected 2)
Since Python doesn't have cell arrays, I changed centers to be an array of arrays where centers[0] = centersX and centers[1] = centersY. What do I need to change such that that assuming the data are the same between matlab and python that the outputs will match?
EDIT:
I have also tried H[:,:,i] = numpy.histogram2d(xData,yData, bins=(centersX,centersY)) to cutout the combining step into centers but no luck.
Have you tried combing them with square brackets?
Maybe you can also use matplotlib.pyplot.hist2d.
H[:,:,i], *_ = numpy.histogram2d(xData,yData,bins=[centers[0], centers[1]])
H[:,:,i], *_ = matplotlib.pyplot.hist2d(xData,yData,bins=[centers[0], centers[1]])
In both, the values in centers are the bin edges, not the centers. You have to adjust the calculation. I think it is enough to remove the dX/2:
centersX(:,i)=linspace(min(xData),max(xData),nbins)';
centersY(:,i)=linspace(min(yData),max(phase),nbins)';
Related
I am trying to create N-Dimensional histogram from 2D array which has complex values. I want to count the number of occurrences in real and imaginary parts of the array given the bins and store the result in a 3D array. It only runs for the first iteration when I hard code i=0 and remove the for loop. I have never used histograms in python before and I just cannot understand the error. The code is given below.
xsoft is defined as 2d array of complex type and I somehow compute bnd_edges by finding max, min values from xsoft and create edges to be given as bins.
xsoft = np.empty((M, MAX,), dtype=complex) # e.g has dims 4*100
xsoft[:] = np.nan
edges = np.linspace(-bnd_edges, bnd_edges, numbin) #numbin=10
pSOFT = np.empty((len(edges)-1, M, len(edges)-1)) # len(edges)= 10
pSOFT[:] = np.nan
for i in range(M):
pSOFT[:, i, :], edges = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))
The code results in the following error
Traceback (most recent call last):
File " ", line 194, in <module>
pSOFT[:, i, :], edges = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))
File "<__array_function__ internals>", line 5, in histogramdd
File " " line 1066, in histogramdd
raise ValueError(
ValueError: `bins[0]` must be a scalar or 1d array
Process finished with exit code 1
You are getting this error because you are overriding the original definition of edges with the second return value of histogramdd.
Replace the last line in your code with this:
pSOFT[:, i, :], edges_i = np.histogramdd((xsoft[i, :].real, xsoft[i, :].imag), bins=(edges, edges))
I'm trying to use plt.text to plot temperature values at their associated lat/lon points on a plot.
After reviewing the plt.text documentation, it appears that the plotted value (third arg) has to be a number and that the number has to be a whole number, NOT a number with decimals.
Below is the code that I'm trying to work with and the associated traceback error that I'm receiving:
Script Code:
data = np.loadtxt('/.../.../.../tmax_day0', delimiter=',', skiprows=1)
grid_x, grid_y = np.mgrid[-85:64:dx, 34:49:dx]
temp = data[:,2]
#print temp
grid_z = griddata((data[:,1],data[:,0]), data[:,2], (grid_x,grid_y), method='linear')
x,y = m(data[:,1], data[:,0]) # flip lat/lon
grid_x,grid_y = m(grid_x,grid_y)
#m.plot(x,y, 'ko', markersize=2)
def str_to_float(str):
try:
number = float(str)
except ValueError:
number = 0.0
return number
fmt = str_to_float(temp)
#annotate point temperature on plot
plt.text(grid_x, grid_y, fmt, fontdict=None)
Traceback Error:
Traceback (most recent call last):
File "plotpoints.py", line 56, in <module>
fmt = str_to_float(temp)
File "plotpoints.py", line 51, in str_to_float
number = float(str)
TypeError: only length-1 arrays can be converted to Python scalars
Data sample from text file tmax_day0:
latitude,longitude,value
36.65408,-83.21783,90
41.00928,-74.73628,92.02
43.77714,-71.75598,90
44.41944,-72.01944,88.8
39.5803,-79.3394,79
38.3154,-76.5501,86
38.91444,-82.09833,94
40.64985,-75.44771,92.6
41.25389,-70.05972,81.2
39.45202,-74.56699,90.88
I was able to achieve plotting data values only by using the following code:
for i in range(len(temp)):
plt.text(x[i], y[i], temp[i], va="top", family="monospace")
Result:
You aren't using a "proper" array, and are instead using a numpy array. Numpy arrays don't play well with non-numpy functions.
Going from your comment, this has been edited.
You would first need to fix the string so it's a proper array.
fmt = fmt[0].split()
I think should work to create a new (normal) array of strings. And then this to map that to an array of floats:
list_of_floats = np.array(map(float, fmt))
I used fsolve to find the zeros of an example sinus function, and worked great. However, I wanted to do the same with a dataset. Two lists of floats, later converted to arrays with numpy.asarray(), containing the (x,y) values, namely 't' and 'ys'.
Although I found some related questions, I failed to implement the code provided in them, as I try to show here. Our arrays of interest are stored in a 2D list (data[i][j], where 'i' corresponds to a variable (e.g. data[0]==t==time==x values) and 'j' are the values of said variable along the x axis (e.g. data[1]==Force). Keep in mind that each data[i] is an array of floats.
Could you offer an example code that takes two inputs (the two mentioned arrays) and returns its intersecting points with a defined function (e.g. 'y=0').
I include some testing I made regarding the other related question. ( #HYRY 's answer)
I do not think it is relevant, but I'm using Spyder through Anaconda.
Thanks in advance!
"""
Following the answer provided by #HYRY in the 'related questions' (see link above).
At this point of the code, the variable 'data' has already been defined as stated before.
"""
from scipy.optimize import fsolve
def tfun(x):
return data[0][x]
def yfun(x):
return data[14][x]
def findIntersection(fun1, fun2, x0):
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
print findIntersection(tfun, yfun, 0)
Which returns the next error
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 36, in tfun
return data[0][x]
IndexError: arrays used as indices must be of integer (or boolean) type
The full output is as it follows:
Traceback (most recent call last):
File "<ipython-input-16-105803b235a9>", line 1, in <module>
runfile('E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py', wdir='E:/Data/Anaconda/[...]/00-Latest')
File "C:\Anaconda\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 580, in runfile
execfile(filename, namespace)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 44, in <module>
print findIntersection(tfun, yfun, 0)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 42, in findIntersection
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 140, in fsolve
res = _root_hybr(func, x0, args, jac=fprime, **options)
File "C:\Anaconda\lib\site-packages\scipy\optimize\minpack.py", line 209, in _root_hybr
ml, mu, epsfcn, factor, diag)
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 42, in <lambda>
return [fsolve(lambda x:fun1(x)-fun2(x, y), x0) for y in range(1, 10)]
File "E:/Data/Anaconda/[...]/00-Latest/fsolvestacktest001.py", line 36, in tfun
return data[0][x]
IndexError: arrays used as indices must be of integer (or boolean) type
You can 'convert' a datasets (arrays) to continuous functions by means of interpolation. scipy.interpolate.interp1d is a factory that provides you with the resulting function, which you could then use with your root finding algorithm.
--edit-- an example for computing an intersection of sin and cos from 20 samples (I've used cubic spline interpolation, as piecewise linear gives warnings about the smoothness):
>>> import numpy, scipy.optimize, scipy.interpolate
>>> x = numpy.linspace(0,2*numpy.pi, 20)
>>> x
array([ 0. , 0.33069396, 0.66138793, 0.99208189, 1.32277585,
1.65346982, 1.98416378, 2.31485774, 2.64555171, 2.97624567,
3.30693964, 3.6376336 , 3.96832756, 4.29902153, 4.62971549,
4.96040945, 5.29110342, 5.62179738, 5.95249134, 6.28318531])
>>> y1sampled = numpy.sin(x)
>>> y2sampled = numpy.cos(x)
>>> y1int = scipy.interpolate.interp1d(x,y1sampled,kind='cubic')
>>> y2int = scipy.interpolate.interp1d(x,y2sampled,kind='cubic')
>>> scipy.optimize.fsolve(lambda x: y1int(x) - y2int(x), numpy.pi)
array([ 3.9269884])
>>> scipy.optimize.fsolve(lambda x: numpy.sin(x) - numpy.cos(x), numpy.pi)
array([ 3.92699082])
Note that interpolation will give you 'guesses' about what data should be between the sampling points. No way to tell how good these guesses are. (but for my example, you can see it's a pretty good estimation)
I have been using R for long time and I am recently learning Python.
I would like to create multiple box plots in one panel in Python.
My dataset is in a vector form and a label vector indicates which box plot each element of data corresponds. The example looks like this:
N = 50
data = np.random.lognormal(size=N, mean=1.5, sigma=1.75)
label = np.repeat([1,2,3,4,5],N/5)
From various websites (e.g., matplotlib: Group boxplots), Creating multiple boxplots requires a matrix object input whose column contains samples for one boxplot. So I created a list object based on data and label:
savelist = data[ label == 1]
for i in [2,3,4,5]:
savelist = [savelist, data[ label == i]]
However, the code below gives me an error:
boxplot(savelist)
Traceback (most recent call last):
File "<ipython-input-222-1a55d04981c4>", line 1, in <module>
boxplot(savelist)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2636, in boxplot
meanprops=meanprops, manage_xticks=manage_xticks)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 3045, in boxplot labels=labels)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/cbook.py", line 1962, in boxplot_stats
stats['mean'] = np.mean(x)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2727, in mean
out=out, keepdims=keepdims)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py", line 66, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
ValueError: operands could not be broadcast together with shapes (2,) (10,)
Can anyone explain what is going on?
You're ending up with a nested list instead of a flat list. Try this instead:
savelist = [data[label == 1]]
for i in [2,3,4,5]:
savelist.append(data[label == i])
And it should work.
I'm getting a ZeroDivisionError from the following code:
#stacking the array into a complex array allows np.unique to choose
#truely unique points. We also keep a handle on the unique indices
#to allow us to index `self` in the same order.
unique_points,index = np.unique(xdata[mask]+1j*ydata[mask],
return_index=True)
#Now we break it into the data structure we need.
points = np.column_stack((unique_points.real,unique_points.imag))
xx1,xx2 = self.meta['rcm_xx1'],self.meta['rcm_xx2']
yy1 = self.meta['rcm_yy2']
gx = np.arange(xx1,xx2+dx,dx)
gy = np.arange(-yy1,yy1+dy,dy)
GX,GY = np.meshgrid(gx,gy)
xi = np.column_stack((GX.ravel(),GY.ravel()))
gdata = griddata(points,self[mask][index],xi,method='linear',
fill_value=np.nan)
Here, xdata,ydata and self are all 2D numpy.ndarrays (or subclasses thereof) with the same shape and dtype=np.float32. mask is a 2d ndarray with the same shape and dtype=bool. Here's a link for those wanting to peruse the scipy.interpolate.griddata documentation.
Originally, xdata and ydata are derived from a non-uniform cylindrical grid that has a 4 point stencil -- I thought that the error might be coming from the fact that the same point was defined multiple times, so I made the set of input points unique as suggested in this question. Unfortunately, that hasn't seemed to help. The full traceback is:
Traceback (most recent call last):
File "/xxxxxxx/rcm.py", line 428, in <module>
x[...,1].to_pz0()
File "/xxxxxxx/rcm.py", line 285, in to_pz0
fill_value=fill_value)
File "/usr/local/lib/python2.7/site-packages/scipy/interpolate/ndgriddata.py", line 183, in griddata
ip = LinearNDInterpolator(points, values, fill_value=fill_value)
File "interpnd.pyx", line 192, in scipy.interpolate.interpnd.LinearNDInterpolator.__init__ (scipy/interpolate/interpnd.c:2935)
File "qhull.pyx", line 996, in scipy.spatial.qhull.Delaunay.__init__ (scipy/spatial/qhull.c:6607)
File "qhull.pyx", line 183, in scipy.spatial.qhull._construct_delaunay (scipy/spatial/qhull.c:1919)
ZeroDivisionError: float division
For what it's worth, the code "works" (No exception) if I use the "nearest" method.