How to make a basic scatter plot of column in a DataFrame vs the index of that DataFrame? Im using python 2.7.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataframe['Col'].plot()
plt.show()
This shows a line chart of 'Col' plotted against the values in my DataFrame index (dates in this case).
But how do I plot a scatterplot rather than a line chart?
I tried
plt.scatter(dataframe['Col'])
plt.show()
But scatter() requires 2 arguments. So how do I pass the series dataframe['Col'] and my dataframe index into scatter() ?
I for this I tried
plt.scatter(dataframe.index.values, dataframe['Col'])
plt.show()
But chart is blank.
If you just want to change from lines to points (and not really want/need to use matplotlib.scatter) you can simply set the style:
In [6]: df= pd.DataFrame({'Col': np.random.uniform(size=1000)})
In [7]: df['Col'].plot(style='.')
Out[7]: <matplotlib.axes.AxesSubplot at 0x4c3bb10>
See the docs of DataFrame.plot and the general plotting documentation.
Strange. That ought to work.
Running this
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
dataframe = pd.DataFrame({'Col': np.random.uniform(size=1000)})
plt.scatter(dataframe.index, dataframe['Col'])
spits out something like this
Maybe quit() and fire up a new session?
Related
I would like to plot multiple subplot of histogram to observe the distribution of each individual features in a data frame.
I have tried to use the below code, but it says dataframe object has no attribute i. I know the code is wrong somewhere, but i have tried to search for solution and could not find a way to generate a loop for this.
for i in enumerate(feature):
plt.subplot(3,3, i[0]+1)
sns.histplot(df.i[i], kde=True)
Is this what you're looking for?
import pandas as pd
import matplotlib.pyplot as plt
plt.style.use('seaborn')
df = pd.read_csv('data.csv')
df.plot.hist(subplots=True, legend=False)
Seaborn alternative
g = sns.FacetGrid(df, row='column')
g.map(plt.hist, 'value')
I have an excel file with the following data:
My code so far:
import pandas as pd
import matplotlib as plt
df=pd.read_excel('file.xlsx', header=[0,1], index_col=[0])
Firstly, am I reading in my file correctly to have a multi index using Main (A,B,C) as the first level and Value (X,Y) as the second level.
Using Pandas and Matplotlib - how do I plot individual scatter plot for Main (A,B,C) with each x,y as the scatter values (imaged below) . I can do it messily calling each column in an individual plot function.
Is there a nicer way to do it with multi-indexing or group by?
This should help:
df = df.set_index(['Main1', 'Main2']).value
df.unstack.plot(kind='line', stacked=True)
I'm using pandas.plotting.table to generate a matplotlib table for my data. The pandas documentation states that if the rowLabels kwarg is not specified, it uses the data index. However, I want to plot my data without the index at all. I couldn't find a way to override the setting in the pyplot table either.
Currently my output looks like this:
I found it easier to bypass pandas and to fallback to calling matplotlib routines to do the plotting.
import matplotlib.pyplot as plt
import pandas as pd
df = <load your data into a pandas dataframe>
fig, ax = plt.subplots(1, 1)
ax.table(cellText=df.values, colLabels=df.keys(), loc='center')
plt.show()
See the docs for ax.table
I'm getting an error when trying to plot a bar graph using matplotlib.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
source_data= pd.read_csv("ehresp_2015.csv")
then I extract the two columns from the data set that I need
results_1 = source_data[["EUGROSHP","EUGENHTH"]]
get rid of the negative values
results_1_new = results_1[results_1>0]
plot the data
x=results_1_new['EUGROSHP']
y=results_1_new['EUGENHTH']
plt.bar([x],[y])
plt.show()
I'm getting an error TypeError: cannot convert the series to
You have the Series to plot encapsulated in a list, plt.bar([x],[y]. This way you would ask matplotlib to plot exactly one bar at position x and height y. Since x, y are no numerical values, but Series themselves, this is of course not possible and results in the error TypeError: cannot convert the series to <type 'float'>
The solution is quite simple, don't put the Series into lists, but leave them as Series:
plt.bar(x,y)
Just to show you how a minimal verifiable example which you may use when asking a question could look like, here a complete code:
import numpy as np; np.random.seed(42)
import matplotlib.pyplot as plt
import pandas as pd
fig, axes = plt.subplots()
data = pd.DataFrame({"a":np.arange(10)-2,"b":np.random.randn(10)})
results_1 = data[["a","b"]]
results_1_new = results_1[results_1>0]
x=results_1_new['a']
y=results_1_new['b']
plt.bar(x,y)
plt.show()
Use df.plot, should be easier.
source_data.set_index("EUGROSHP")["EUGENHTH"].plot(kind='bar', legend=True)
plt.show()
Also, note that results_1[results_1>0] will give you a bunch of NaNs in your columns, did you mean to filter on a single column instead?
I need to make a plot of the following data, with the year_week on x-axis, the test_duration on the y-axis, and each operator as a different series. There may be multiple data points for the same operator in one week. I need to show standard deviation bands around each series.
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
prints as:
I have looked at seaborn, matplotlib, and pandas, but I cannot find a solution.
It could be that you are looking for seaborn pointplot.
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
data = pd.DataFrame({'year_week':[1601,1602,1603,1604,1604,1604],
'operator':['jones','jack','john','jones','jones','jack'],
'test_duration':[10,12,43,7,23,9]})
sns.pointplot(x="year_week", y="test_duration", hue="operator", data=data)
plt.show()