In ggplot for Python, using discrete X scale with geom_point()? - python

The following example returns an error. It appears that using a discrete (not continuous) scale for the x-axis in ggplot in Python is not supported?
import pandas as pd
import ggplot
df = pd.DataFrame.from_dict({'a':['a','b','c'],
'percentage':[.1,.2,.3]})
p = ggplot.ggplot(data=df,
aesthetics=ggplot.aes(x='a',
y='percentage'))\
+ ggplot.geom_point()
print(p)
As mentioned, this returns:
Traceback (most recent call last):
File "/Users/me/Library/Preferences/PyCharm2016.1/scratches/scratch_1.py", line 30, in <module>
print(p)
File "/Users/me/lib/python3.5/site-packages/ggplot/ggplot.py", line 116, in __repr__
self.make()
File "/Users/me/lib/python3.5/site-packages/ggplot/ggplot.py", line 627, in make
layer.plot(ax, facetgroup, self._aes, **kwargs)
File "/Users/me/lib/python3.5/site-packages/ggplot/geoms/geom_point.py", line 60, in plot
ax.scatter(x, y, **params)
File "/Users/me/lib/python3.5/site-packages/matplotlib/__init__.py", line 1819, in inner
return func(ax, *args, **kwargs)
File "/Users/me/lib/python3.5/site-packages/matplotlib/axes/_axes.py", line 3838, in scatter
x, y, s, c = cbook.delete_masked_points(x, y, s, c)
File "/Users/me/lib/python3.5/site-packages/matplotlib/cbook.py", line 1848, in delete_masked_points
raise ValueError("First argument must be a sequence")
ValueError: First argument must be a sequence
Any workarounds for using ggplot with scatters on a discrete scale?

One option is to generate a continuous series, and use the original variable as labels. But this seems like a painful workaround.
df = pd.DataFrame.from_dict( {'a':[0,1,2],
'a_name':['a','b','c'],
'percentage':[.1,.2,.3]})
p = ggplot.ggplot(data=df,
aesthetics=ggplot.aes(x='a',
y='percentage'))\
+ ggplot.geom_point()\
+ ggplot.scale_x_continuous(breaks=list(df['a']),
labels=list(df['a_name']))

I was getting the same error when trying to plot 2 columns of a dataframe. I was reading the data from a csv file and converting it into a dataframe.
readdata=csv.reader(open(filename),delimiter="\t")
df= pd.DataFrame(data, columns=header)
df.columns=["pulseVoltage","dutVoltage","dutCurrent","leakageCurrent"]
print (df.dtypes)
When I checked the data types, for some reason they were shown as object instead of float that I expected (I am a newbie and this might be trivial knowledge which I don't know). Therefore, I went ahead and did an explicit conversion of columns to data type float.
df["dutVoltage"]=df["dutVoltage"].astype("float")
df["dutCurrent"]=df["dutCurrent"].astype("float")
Now I can use ggplot to plot the data without any error.
print ggplot(df, aes('dutVoltage','dutCurrent'))+ \
geom_point()

Related

matplotlib fill_between raises errors of "ufunc 'isfinite' not supported for the input types..."?

I want to fill the gap between two lines in a plot using matplotlib, but it always shows an error message like below. I have checked there is a similar question posted in the forum, but their solutions do not solve this problem.
import numpy as np
import pandas as pd
import openpyxl, os
import matplotlib.pyplot as plt
#read and convert data from an excel file to two dimensional numpy.array
df = pd.read_excel("Water.xlsx").values
#x axis is year, from 1900 to 2017
year = np.arange(1900,2018,1)
#read data from df and convert them to one-dimensional numpy.array
W_low = df[:,[29]].ravel()
W_high = df[:,[30]].ravel()
#plot the two lines and fill in the gap
fig, ax1 = plt.subplots(1, 1, sharex=True)
#no errors are raised until this line
ax1.fill_between (year, W_low, W_high)
Error message
File "C:\Users\Christina\.spyder-py3\temp.py", line 24, in <module>
ax1.fill_between (year, W_low, W_high)
File "C:\Users\Christina\anaconda3\lib\site-packages\matplotlib\__init__.py", line 1599, in inner
return func(ax, *map(sanitize_sequence, args), **kwargs)
File "C:\Users\Christina\anaconda3\lib\site-packages\matplotlib\axes\_axes.py", line 5233, in fill_between
y1 = ma.masked_invalid(self.convert_yunits(y1))
File "C:\Users\Christina\anaconda3\lib\site-packages\numpy\ma\core.py", line 2377, in masked_invalid
condition = ~(np.isfinite(a))
TypeError: ufunc 'isfinite' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
I have solved the problem myself. It seems that the dataset I used previously was not clean enough. I copy-pasted the whole dataset to a new excel sheet, and run the same script then it works perfectly.
Hope my solution will help others who have encountered the same problem.

Cannot plot my function : return array(a, dtype, copy=False, order=order) TypeError: float() argument must be a string or a number

I'm trying to plot a function that gives the arctan of the angle of several scatterplots (it's a physics experiment):
Here is my code:
import numpy as np
import matplotlib.pyplot as plt
filename='rawPhaseDataf2f_17h_15m.dat'
datatype=np.dtype( [('Shotnumber',np.dtype('>f8')),('A1',np.dtype('>f8')), ('A2',np.dtype('>f8')), ('f2f',np.dtype('>f8')), ('intensity',np.dtype('>f8'))])
data=np.fromfile(filename,dtype=datatype)
#time=data['Shotnumber']/9900 # reprate is 9900 Hz -> time in seconds
A1=data['A1']
A2=data['A2']
#np.sort()
i=range(1,209773)
def x(i) :
return arctan((A1.item(i)/A2.item(i))*(i/209772))
def y(i) :
return i*2*pi/209772
plot(x,y)
plt.figure('Scatterplot')
plt.plot(A1,A2,',') #Scatterplot
plt.xlabel('A1')
plt.ylabel('A2')
plt.figure('2D Histogram')
plt.hist2d(A1,A2,100) # 2D Histogram
plt.xlabel('A1')
plt.ylabel('A2')
plt.show()
My error is:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell /sitecustomize.py", line 540, in runfile
execfile(filename, namespace)
File "/home/nelly/Bureau/ Téléchargements/Kr4 Experiment/read_rawPhaseData.py", line 21, in <module>
plot(x,y)
File "/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line 2987, in plot
ret = ax.plot(*args, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 4138, in plot
self.add_line(line)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1497, in add_line
self._update_line_limits(line)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py", line 1508, in _update_line_limits
path = line.get_path()
File "/usr/lib/pymodules/python2.7/matplotlib/lines.py", line 743, in get_path
self.recache()
File "/usr/lib/pymodules/python2.7/matplotlib/lines.py", line 420, in recache
x = np.asarray(xconv, np.float_)
File "/usr/lib/python2.7/dist-packages/numpy/core/numeric.py", line 460, in asarray
return array(a, dtype, copy=False, order=order)
TypeError: float() argument must be a string or a number
I know that the problem is from the plot(x,y). I think that my error comes from the definition of x and y. A1 and A2 are matrix, N the number of points and Ak is the index of the matrix. I want to have arctan(A1k/A2k)*(k/N).
There are lots of problems with your code, and your understanding of python and array operations. I'm just going to handle the first part of the code (and the error you get), and hopefully you can continue to fix it from there.
This should fix the error you're getting and generate a plot:
# size = 209772
size = A1.size # I'm assuming that the size of the array is 209772
z = np.arange(1, size+1)/(size+1) # construct an array from [1/209773, 1.0]
# Calculate the x and y arrays
x = np.arctan((A1/A2)*z)
y = z*2*pi
# Plot x and y
plt.plot(x, y)
Discussion:
There are lots of issues with this chunk of code:
i=range(1,209773)
def x(i) :
return arctan((A1.item(i)/A2.item(i))*(i/209772))
def y(i) :
return i*2*pi/209772
plot(x, y)
You're defining two functions called x and y, and then you are passing those functions to the plotting method. The plotting method accepts numbers (in lists or arrays), not functions. That is the reason for the error that you are getting. So you instead need to construct a list/array of numbers and pass that to the function.
You're defining a variable i which is a list of numbers. But when you define the functions x and y, you are creating new variables named i which have nothing to do with the list you created earlier. This is because of how "scope" works in python.
The functions arctan and plot are not defined "globally", instead they are only defined in the packages numpy and matplotlib. So you need to call them from those packages.

ValueError: Wrong number of items passed 500, placement implies 1, Python and Pandas

I'm importing just two columns from .xlsx file and I would like to calculate some stuff (mean, deviation, percent change) and then I would like to plot all this. First part doesn't give me any problems, but plotting does.
My code looks like this:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import matplotlib.mlab as mlab
import math
df = pd.read_excel('KDPrviIzbor.xlsx', sheetname='List1', index_col = 0)
ch = df.pct_change(periods=252)
ma = np.mean(ch)*100
std = np.std(ch)*100
x = np.linspace(-100,100,500)
plt.plot(x,mlab.normpdf(x,ma,std))
plt.show()
But when I run my code, I get this error:
Traceback (most recent call last):
File "C:/Users/David/PythonStuff/normal_distribution.py", line 21, in <module> plt.plot(x,mlab.normpdf(x,ma,std))
File "C:\Python27\lib\site-packages\matplotlib\mlab.py", line 1579, in normpdf return 1./(np.sqrt(2*np.pi)*sigma)*np.exp(-0.5 * (1./sigma*(x - mu))**2)
File "C:\Python27\lib\site-packages\pandas\core\ops.py", line 534, in wrapper dtype=dtype)
File "C:\Python27\lib\site-packages\pandas\core\series.py", line 220, in __init__ data = SingleBlockManager(data, index, fastpath=True)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 3383, in __init__ ndim=1, fastpath=True)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 2101, in make_block placement=placement)
File "C:\Python27\lib\site-packages\pandas\core\internals.py", line 77, in __init__ len(self.values), len(self.mgr_locs)))
ValueError: Wrong number of items passed 500, placement implies 1`
I figured that the problem is in:
plt.plot(x,mlab.normpdf(x,ma,std))
but I cannot solve it. Any suggestions?
ma and std are pandas.Series objects in your example. The reason is, that np.mean applied to a pandas.DataFrame returns a pandas.Series.
However, mlab.normpdf(x,ma,std) expects float values or numpy arrays as inputs.
You could simply convert ma and std to floats by ma = float(ma).
I would not suggest to use int(ma) as you pointed out in your comment, because that would cut away the decimals.

Python: create multiple boxplots in one pannel

I have been using R for long time and I am recently learning Python.
I would like to create multiple box plots in one panel in Python.
My dataset is in a vector form and a label vector indicates which box plot each element of data corresponds. The example looks like this:
N = 50
data = np.random.lognormal(size=N, mean=1.5, sigma=1.75)
label = np.repeat([1,2,3,4,5],N/5)
From various websites (e.g., matplotlib: Group boxplots), Creating multiple boxplots requires a matrix object input whose column contains samples for one boxplot. So I created a list object based on data and label:
savelist = data[ label == 1]
for i in [2,3,4,5]:
savelist = [savelist, data[ label == i]]
However, the code below gives me an error:
boxplot(savelist)
Traceback (most recent call last):
File "<ipython-input-222-1a55d04981c4>", line 1, in <module>
boxplot(savelist)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/pyplot.py", line 2636, in boxplot
meanprops=meanprops, manage_xticks=manage_xticks)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/axes/_axes.py", line 3045, in boxplot labels=labels)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/matplotlib/cbook.py", line 1962, in boxplot_stats
stats['mean'] = np.mean(x)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/fromnumeric.py", line 2727, in mean
out=out, keepdims=keepdims)
File "/Users/yumik091186/anaconda/lib/python2.7/site-packages/numpy/core/_methods.py", line 66, in _mean
ret = umr_sum(arr, axis, dtype, out, keepdims)
ValueError: operands could not be broadcast together with shapes (2,) (10,)
Can anyone explain what is going on?
You're ending up with a nested list instead of a flat list. Try this instead:
savelist = [data[label == 1]]
for i in [2,3,4,5]:
savelist.append(data[label == i])
And it should work.

ValueError when trying to save ndarray (Numpy)

I am trying to translate a project I have in MATLAB to Python+Numpy because MATLAB keeps running out of memory. The file I have is rather long, so I have tried to make a minimal example that shows the same error.
Basically I'm making a 2d histogram of a dataset, and want to save it after some processing. The problem is that the numpy.save function throws a "ValueError: setting an array element with a sequence" when I try to save the output of the histogram function. I can't find the problem when I look at the docs of Numpy.
My version of Python is 2.6.6, Numpy version 1.4.1 on a Debian distro.
import numpy as np
import random
n_samples = 5
rows = 5
out_file = file('dens.bin','wb')
x_bins = np.arange(-2.005,2.005,0.01)
y_bins = np.arange(-0.5,n_samples+0.5)
listy = [random.gauss(0,1) for r in range(n_samples*rows)]
dens = np.histogram2d( listy, \
range(n_samples)*rows, \
[y_bins, x_bins])
print 'Write data'
np.savez(out_file, dens)
out_file.close()
Full output:
$ python error.py
Write data
Traceback (most recent call last):
File "error.py", line 19, in <module>
np.savez(out_file, dens)
File "/usr/lib/pymodules/python2.6/numpy/lib/io.py", line 439, in savez
format.write_array(fid, np.asanyarray(val))
File "/usr/lib/pymodules/python2.6/numpy/core/numeric.py", line 312, in asanyarray
return array(a, dtype, copy=False, order=order, subok=True)
ValueError: setting an array element with a sequence.
Note that np.histogram2d actually returns a tuple of three arrays: (hist, x_bins, y_bins). If you want to save all three of these, you have to unpack them as #Francesco said.
dens = np.histogram2d(listy,
range(n_samples)*rows,
[y_bins, x_bins])
np.savez('dens.bin', *dens)
Alternatively, if you only need the histogram itself, you could save just that.
np.savez('dens.bin', dens[0])
If you want to keep track of which of these is which, use the **kwds instead of the *args
denskw = dict(zip(['hist','y_bins','x_bins'], dens))
np.savez('dens.bin', **denskw)
Then, you can load it like
dens = np.load('dens.bin')
hist = dens['hist']# etc

Categories