load "bmi.csv" into the Dataframe and create a scatter plot of the data using
relplot() with height on x-axis and weight on y-axis and color the plot
points based on Gender and vary the size of the points by BMI index.
My code is:
import pandas as pd
import seaborn as sns
df = pd.read_csv('bmi.csv')
BMI = pd.DataFrame(df)
g = sns.relplot(x = 'Height', y = 'Weight', data=df);b
I get:
Traceback (most recent call last):
File "<pyshell#4>", line 1, in <module>
g = sns.relplot(x = 'Height', y = 'Weight', data=df);b
File "/Users/aleksikurunsaari/Library/Python/3.10/lib/python/site-packages/seaborn/relational.py", line 862, in relplot
p = plotter(
File "/Users/aleksikurunsaari/Library/Python/3.10/lib/python/site-packages/seaborn/relational.py", line 538, in __init__
super().__init__(data=data, variables=variables)
File "/Users/aleksikurunsaari/Library/Python/3.10/lib/python/site-packages/seaborn/_oldcore.py", line 640, in __init__
self.assign_variables(data, variables)
File "/Users/aleksikurunsaari/Library/Python/3.10/lib/python/site-packages/seaborn/_oldcore.py", line 701, in assign_variables
plot_data, variables = self._assign_variables_longform(
File "/Users/aleksikurunsaari/Library/Python/3.10/lib/python/site-packages/seaborn/_oldcore.py", line 938, in _assign_variables_longform
raise ValueError(err)
ValueError: Could not interpret value `Height` for parameter `x`
Besides the error, why are you constructing a dataframe from a dataframe and also you're not using it ? I'm talking about BMI here :
df = pd.read_csv('bmi.csv')
BMI = pd.DataFrame(df)
And regarding the error, this one has occured because Height is not one of the columns of df. I suggest you to check the content/shape/columns of this dataframe before plotting with seaborn. It may be a problem with the separator of your .csv.
sns.relplot(x = 'Height', y = 'Weight', data=df)
Dataset: https://github.com/aniketsoni1/BMI-Data-Insight-using-SVM/blob/master/bmi.csv
Related
With the low cycle fatigue data, I'm trying to plot the Hysteresis loop. But I'm getting the following error:
[ -52.18297297 -45.58565338 16.9913185 ... -354.53630032 -295.50857248
-155.42088911]
[-0.01229182 -0.00891753 0.02256744 ... -0.33507242 -0.31283728
-0.24790212]
Traceback (most recent call last):
File "f:\I2M\LCF\Ep1_camp4_P4_TTH650 06-9-21 11 01 24\ep1_camp4_P4.py", line 16, in <module>
plt.plot(strain, Sigma, color = 'k')
File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\site-
packages\matplotlib\pyplot.py", line 2840, in plot
return gca().plot(
File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\site-
packages\matplotlib\axes\_axes.py", line 1743, in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]
File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\site-
packages\matplotlib\axes\_base.py", line 273, in __call__
yield from self._plot_args(this, kwargs)
File "C:\Users\DELL\AppData\Local\Programs\Python\Python39\lib\site-
packages\matplotlib\axes\_base.py", line 379, in _plot_args
raise ValueError("x, y, and format string must not be None")
ValueError: x, y, and format string must not be None
And here is my code:
import matplotlib.pyplot as plt
import numpy as np
plt.style.use(['science','no-latex'])
x = np.loadtxt('F:\\I2M\\LCF\\Ep1_camp4_P4_TTH650 06-9-21 11 01 24\\data_1.csv',unpack = True,
skiprows = 2, usecols = 2, delimiter = ',')
y = np.loadtxt('F:\\I2M\\LCF\\Ep1_camp4_P4_TTH650 06-9-21 11 01 24\\data_1.csv',unpack = True,
skiprows = 2, usecols = 3, delimiter = ',')
stress = (x*1000)/28.27 #N/mm^2 = MPa
length = len(stress)
length = len(y)
plt.figure(figsize=(5, 5))
Sigma = print(stress[0:length:10]) #stress
strain = print(y[0:length:10])
plt.plot(strain, Sigma, color = 'k')
plt.show()
Data contains many rows. So I used some commands to access only particular values from the row
Your problem is here
Sigma = print(stress[0:length:10]) #stress
strain = print(y[0:length:10])
what you want plausibly is to sample every 10th data point, but what you get is … nothing or, from the point of view of Python: None, so that later your stack trace informs you that x, y, and format string must not be None.
Why this happens, and how you solve the problem?
When you make an assignment, the value of the expression on the right is saved and you can use the name on the left to use it later, so you save, e.g., the value returned by print(y[0:length:10]) to use it later under the name strain, but print() is used for its side effects (i.e., showing a bunch of characters on your terminal) and the value that is returned in these cases is by default None, not what was shown on your terminal.
If I have understood your intentions, you should omit the two lines above and just use
plt.plot(x[0:length:10], y[0:length:10], color='k')
A side note, you have
length = len(stress)
length = len(y)
but you read them from the same file, one assignment should be enough…
PS
x, y = np.loadtxt('…\\data_1.csv', unpack=1, skiprows=2, usecols=[2,3], delimiter=',')
I was trying to plot an excel dataset, however, one dataset is able to plot and with the second one I get an error message.
My code:
from matplotlib import pyplot as plt
from xlrd import open_workbook
x_time = list()
y_absorbance = list()
book = open_workbook("SP1.xls")
sheet = book.sheet_by_index(0) #data is on sheet 1
column1 = sheet.col_values(0)
column2 = sheet.col_values(1)
for x in column1:
try:
num = float(x)
time = num/1000000
x_time.append(time)
except:
continue
for y in column2:
try:
num = float(y)
absorbance = num/1000000
y_absorbance.append(absorbance)
except:
continue
plt.plot(x_time, y_absorbance)
plt.title("Final Analysis native R5 Main Pool", fontname="Times New Roman",fontweight="bold", size=20)
plt.xlabel("run time [min]", fontname="Times New Roman")
plt.ylabel("Absorbance [mAU]", fontname="Times New Roman")
plt.legend(("UV-VIS 214 nm",), loc="upper right")
plt.show()
Error message:
Traceback (most recent call last):
File "/Users/nico/PycharmProjects/Exercise/HPLC_plot.py", line 29, in <module>
plt.plot(x_time, y_absorbance)
File "/Users/nico/PycharmProjects/venv/lib/python3.9/site-packages/matplotlib/pyplot.py", line 2840, in plot
return gca().plot(
File "/Users/nico/PycharmProjects/venv/lib/python3.9/site-packages/matplotlib/axes/_axes.py", line 1743, in plot
lines = [*self._get_lines(*args, data=data, **kwargs)]
File "/Users/nico/PycharmProjects/venv/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 273, in __call__
yield from self._plot_args(this, kwargs)
File "/Users/nico/PycharmProjects/venv/lib/python3.9/site-packages/matplotlib/axes/_base.py", line 399, in _plot_args
raise ValueError(f"x and y must have same first dimension, but "
ValueError: x and y must have same first dimension, but have shapes (13500,) and (13476,)
The data is an excel sheet and looks like this:
https://i.stack.imgur.com/npJDy.png
The other plot looks like this
so this is how the plot should look like.
I do not know what I am doing wrong since with the first dataset it works and the second data set is from the same HPLC-software just from another run.
Any suggestions are appreciated in advance! :)
I have been using lmfit for about a day now and needless to say I know very little about the library. I have been using several built-in models for curve fitting and all of them work flawlessly with the data except the Lognormal Model.
Here is my code:
from numpy import *
from lmfit.models import LognormalModel
import pandas as pd
import scipy.integrate as integrate
import matplotlib.pyplot as plt
data = pd.read_csv('./data.csv', delimiter = ",")
x = data.ix[:, 0]
y = data.ix[:, 1]
print (x)
print (y)
mod = LognormalModel()
pars = mod.guess(y, x=x)
out = mod.fit(y, pars , x=x)
print(out.best_values)
print(out.fit_report(min_correl=0.25))
out.plot()
plt.plot(x, y, 'bo')
plt.plot(x, out.init_fit, 'k--')
plt.plot(x, out.best_fit, 'r-')
plt.show()
and the error output is:
Traceback (most recent call last):
File "Cs_curve_fit.py", line 17, in <module>
pvout = pvmod.fit(y, amplitude= 1, center = 1, sigma =1 , x=x)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 731, in fit
output.fit(data=data, weights=weights)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 944, in fit
self.init_fit = self.model.eval(params=self.params, **self.userkws)
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\model.py", line 569, in eval
return self.func(**self.make_funcargs(params, kwargs))
File "C:\Users\NAME\Anaconda3\lib\site-packages\lmfit\lineshapes.py", line 162, in lognormal
x[where(x <= 1.e-19)] = 1.e-19
File "C:\Users\NAME\Anaconda3\lib\site-packages\pandas\core\series.py", line 773, in __setitem__
setitem(key, value)
File "C:\Users\NAME\Anaconda3\lib\site-packages\pandas\core\series.py", line 755, in setitem
raise ValueError("Can only tuple-index with a MultiIndex")
ValueError: Can only tuple-index with a MultiIndex
First, the error message you show cannot have come from the code you post. The error message says that line 17 of the file "Cs_curve_fit.py" reads
pvout = pvmod.fit(y, amplitude= 1, center = 1, sigma =1 , x=x)
but that is not anywhere in your code. Please post the actual code and the actual output.
Second, the problem appears to happening because the data for x is cannot be turned into a 1D numpy array. Not being able to trust your code or output, I would just suggest converting the data to 1D numpy arrays yourself as a first test. Lmfit should be able to handle Pandas series, but it just does a simple coercion to 1D numpy arrays.
I am working on a kmeans clustering. I have write down a code with the help of some available references on the web but when I run this code it fires an error:
Traceback (most recent call last):
File "clustering.py", line 16, in <module>
ds = df[np.where(labels==i)]
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1678, in __getitem__
return self._getitem_column(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/frame.py", line 1685, in _getitem_column
return self._get_item_cache(key)
File "/usr/lib/python2.7/dist-packages/pandas/core/generic.py", line 1050, in _get_item_cache
res = cache.get(item)
TypeError: unhashable type: 'numpy.ndarray'
Though, many previous threads are available with the same error but there is no single solution available that can handle this error in my program. How can I debug this error ?
Code which i used:
from sklearn import cluster
import pandas as pd
df = [
[0.57,-0.845,-0.8277,-0.1585,-1.616],
[0.47,-0.14,-0.5277,-0.158,-1.716],
[0.17,-0.845,-0.5277,-0.158,-1.616],
[0.27,-0.14,-0.8277,-0.158,-1.716]]
df = pd.DataFrame(df,columns= ["a","b","c","d", "e"])
# df = pd.read_csv("cleaned_remove_cor.csv")
k = 3
kmeans = cluster.KMeans(n_clusters=k)
kmeans.fit(df)
labels = kmeans.labels_
centroids = kmeans.cluster_centers_
from matplotlib import pyplot
import numpy as np
for i in range(k):
# select only data observations with cluster label == i
ds = df[np.where(labels==i)]
# plot the data observations
pyplot.plot(ds[:,0],ds[:,1],'o')
# plot the centroids
lines = pyplot.plot(centroids[i,0],centroids[i,1],'kx')
# make the centroid x's bigger
pyplot.setp(lines,ms=15.0)
pyplot.setp(lines,mew=2.0)
pyplot.show()
The shape of my DataFrame is (8127x600)
I tried and this works for me, conversion of pandas df to numpy matrix:
df = df.as_matrix(columns= ["a","b","c","d", "e"])
For my assignment I'm supposed to plot the tracks of 20 hurricanes on a map using matplotlib. However when I run my code I get the error: AssertionError:Grouper and axis must be the same length
Here's the code I have:
import numpy as np
from mpl_toolkits.basemap import Basemap
import matplotlib.pyplot as plt
from PIL import *
fig = plt.figure(figsize=(12,12))
ax = fig.add_axes([0.1,0.1,0.8,0.8])
m = Basemap(llcrnrlon=-100.,llcrnrlat=0.,urcrnrlon=-20.,urcrnrlat=57.,
projection='lcc',lat_1=20.,lat_2=40.,lon_0=-60.,
resolution ='l',area_thresh=1000.)
m.bluemarble()
m.drawcoastlines(linewidth=0.5)
m.drawcountries(linewidth=0.5)
m.drawstates(linewidth=0.5)
# Creates parallels and meridians
m.drawparallels(np.arange(10.,35.,5.),labels=[1,0,0,1])
m.drawmeridians(np.arange(-120.,-80.,5.),labels=[1,0,0,1])
m.drawmapboundary(fill_color='aqua')
# Opens data file
import pandas as pd
name = [ ]
df = pd.read_csv('louisianastormb.csv')
for name, group in df.groupby([name]):
latitude = group.lat.values
longitude = group.lon.values
x,y = m(longitude, latitude)
plt.plot(x,y,'y-',linewidth=2 )
plt.xlabel('Longitude')
plt.ylabel('Latitude')
plt.title('20 Hurricanes with Landfall in Louisiana')
plt.savefig('20hurpaths.jpg', dpi=100)
Here's the full error output:
Traceback (most recent call last):
File "/home/darealmzd/lstorms.py", line 31, in <module>
for name, group in df.groupby([name]):
File "/usr/local/lib/python2.7/dist-packages/pandas/core/generic.py", line 186, in groupby
squeeze=squeeze)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 533, in groupby
return klass(obj, by, **kwds)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 197, in __init__
level=level, sort=sort)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1325, in _get_grouper
ping = Grouping(group_axis, gpr, name=name, level=level, sort=sort)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1129, in __init__
self.grouper = _convert_grouper(index, grouper)
File "/usr/local/lib/python2.7/dist-packages/pandas/core/groupby.py", line 1350, in _convert_grouper
raise Assertionerror('Grouper and axis must be same length')
Assertionerror: Grouper and axis must be same length
ValueError: Grouper and axis must be same length
This can occur if you are using double brackets in the groupby argument.
(I posted this since it is the top result on Google).
The problem is that you're grouping by (effectively) a list of empty list ([[]]). Because you have name = [] earlier and then you wrap that in a list as well.
If you want to group on a single column (called 'HurricaneName'), you should do something like:
for name, group in df.groupby('HurricaneName'):
However, if you want to group on multiple columns, then you need to pass a list:
for name, group in df.groupby(['HurricaneName', 'Year'])
If you want to put it in a variable like you have, you can do it like this:
col_name = 'State'
for name, group in df.groupby([col_name]):
Try iloc to make grouper equal to axis.
example:
sns.boxplot(x=df['pH-binned'].iloc[0:3], y=v_count, data=df)
In case axis=3.