I have this df:
df = pd.DataFrame({'A': [1, 2, 3], 'B': [2, 3, 5], 'C': ['name 1', 'name 2', 'name 3']})
A B C
0 1 2 name 1
1 2 3 name 2
2 3 5 name 3
What is it the correct way to plot column A and use column C as xticks?
These do not work:
df['A'].plot(xticks='C')
df['A'].plot(xticks=df['C'])
This changes the xticks but not the labels:
df['A'].plot(xticks=[1,2,3])
Should I really convert to sequence? I have also some modification of the question. I got next Error message:
ValueError: could not convert string to float: name 3
In short, I have a column of strings and want to use it as xticks for my plot.
PS: It doesn't look like there's a direct pandas plot function. I found a solution here.
The link you provided is a good resource, but shows the whole thing being done in matplotlib.pyplot and uses .subplots() to get to the axes. While I've done this before, I keep searching for ways to just use the built-into-pandas .plot() function as much as possible. To me it can simplify the code and makes it easier to leverage DataFrame goodness.
There do seem to be a number of things that aren't easy to do fully inside the parameters of df.plot() by itself, though. Luckily it returns an matplotlib.AxesSubplot, which opens up a much larger range of possibilities.
I copied your data above into a DataFrame:
df = pd.read_clipboard(quotechar="'")
It looks sort-of like:
A B C
0 1 2 'name 1'
1 2 3 'name 2'
2 3 5 'name 3'
But, of course, much better in non table-crippled html. (Maybe SO will fix this one day).
Then all I had to do was:
ax = df.A.plot(xticks=df.index, rot=90)
ax.set_xticklabels(df.C)
If you are using IPython/Jupyter and %matplotlib inline then both of those need to be in the same cell. I had forgotten that at first and spent quite a bit of time trying to figure what was going wrong.
You can do it all using the ax variable:
ax = df.A.plot()
ax.set_xticks(df.index)
ax.set_xticklabels(df.C, rotation=90)
but, as I mentioned, I haven't found a way to the xticklabels inside the df.plot() function parameters, which would make it possible to do this all in a single line.
The extra step to rotate the xtick labels may be extraneous in this example, but came in handy in the one I was working on when looking for this answer.
And, of course, you can plot both A and B columns together even easier:
ax = df.plot()
ax.set_xticks(df.index)
ax.set_xticklabels(df.C, rotation=90)
As of matplotlib 3.5.0
Use ax.set_xticks with the new labels param to set ticks and labels simultaneously:
ax = df.plot(y='A')
ax.set_xticks(ticks=df.index, labels=df.C)
# ^^^^^^
Or, since df.plot returns an Axes object, we can chain it:
df.plot(y='A').set_xticks(df.index, df.C)
Note that plt.xticks always had a labels param, so this change just unifies the Axes and pyplot APIs.
When you call plot(), you can pass column names as axis values to use its values as tick labels. So for the question in the OP, the following works.
df.plot(y='A', x='C', legend=False);
or you can set column C as index and just call plot on column A:
df.set_index('C')['A'].plot();
You can also plot multiple column values in one plot by passing a list of column names:
df.plot(y=['A', 'B'], x='C');
Related
I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
In my ignorance I tried this code command:
df.groupby('Letter').hist()
which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"
Any help most appreciated.
I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:
df['N'].hist(by=df['Letter'])
That's a very handy little shortcut for quickly scanning your grouped data!
For future visitors, the product of this call is the following chart:
One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
for group in grouped:
figure()
matplotlib.pyplot.hist(group[1].N)
show()
Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist() it's trying to make a histogram of both columns hence the str error.
This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.
df.reset_index().pivot('index','Letter','N').hist()
The reset_index() is just to shove the current index into a column called index. Then pivot will take your data frame, collect all of the values N for each Letter and make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist() will then produce one histogram per column and you get format the plots as needed.
With recent version of Pandas, you can do
df.N.hist(by=df.Letter)
Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.
I write this answer because I was looking for a way to plot together the histograms of different groups. What follows is not very smart, but it works fine for me. I use Numpy to compute the histogram and Bokeh for plotting. I think it is self-explanatory, but feel free to ask for clarifications and I'll be happy to add details (and write it better).
figures = {
'Transit': figure(title='Transit', x_axis_label='speed [km/h]', y_axis_label='frequency'),
'Driving': figure(title='Driving', x_axis_label='speed [km/h]', y_axis_label='frequency')
}
cols = {'Vienna': 'red', 'Turin': 'blue', 'Rome': 'Orange'}
for gr in df_trips.groupby(['locality', 'means']):
locality = gr[0][0]
means = gr[0][1]
fig = figures[means]
h, b = np.histogram(pd.DataFrame(gr[1]).speed.values)
fig.vbar(x=b[1:], top=h, width=(b[1]-b[0]), legend_label=locality, fill_color=cols[locality], alpha=0.5)
show(gridplot([
[figures['Transit']],
[figures['Driving']],
]))
I find this even easier and faster.
data_df.groupby('Letter').count()['N'].hist(bins=100)
I am having trouble plotting the multi-index datafram below,
since I cant use the m04hour.index range value for some reason.
output from m04hour.head()
However, this plot command works fine:
m04hour['consumption (kWh)'].plot(figsize=(12,2))
But this one doesnt:
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index, m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Because the "m04hour.index" is returning the error:
ValueError: setting an array element with a sequence.
So the question is how to refer to the m04hour.index value for setting the plotting x-axis?
You index in this m04hour is not pd.MultiIndex. It is a index with tuples.
First let's convert that list of tuples into a pd.MultiIndex.
df.index = pd.MultiIndex.from_tuples(df.index)
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index.get_level_values(1), m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Output:
i have a dataframe where i want to plot the histograms of each column.
df_play = pd.DataFrame({'a':['cat','dog','cat'],'b':['apple','orange','orange']})
df_play['a'] = df_play['a'].astype('category')
df_play['b'] = df_play['b'].astype('category')
df_play
df_play.hist(layout = (12,10))
However im getting ValueError: num must be 1 <= num <= 0, not 1
When i tried with integers instead of category in the values, it worked fine but i really want the names of the unique string to be in the x-axis.
You can just apply pd.value_counts across columns and plot.
>>> df_play.apply(pd.value_counts).T.stack().plot(kind='bar')
If you want proper subplots or something more intricate, I'd suggest you just iterate with value_counts and create the subplots yourself.
Since there is no natural parameter for binning, perhaps what you want rather than histograms are bar plots of the value counts for each Series? If so, you can achieve that through
df_play['a'].value_counts().plot(kind='bar')
I realized a way to do this is to first specify the fig and axs then loop though the column names of the dataframe that we want to plot the value counts.
fig, axs = plt.subplots(1,len(df_play.columns),figsize(10,6))
for i,x in enumerate(df_play.columns):
df_play[x].value_counts().plot(kind='bar',ax=axs[i])
I have a very standard dataset with 2 columns, 1 for dates and 1 for values. I've put them into two arrays:
dates
['1/1/2014', '1/2/2014', '1/3/2014', ...]
values
[1423, 4321, 1234, ...]
How can I create a simple line graph with "values" on the y-axis and "dates" on the x-axis?
What I've tried:
I can do a "Hello world" line plot with only "values" in just 1 line (awesome):
import numpy as np
import matplotlib.pyplot as plt
plt.plot(values)
Next, let's add "dates" as the x-axis. At this point I'm stuck. How can I transform my "dates" array, which is strings, into an array that is plottable?
Looking at examples, I believe we are supposed to cast the strings into Python Date objects. Let's import those libraries:
import datetime
import matplotlib.dates
Ok so now I can transform a string into a date time.strptime(dates[1], '%m/%d/%Y'). How can I transform the entire array? I could write a loop, but assuming there is a better way.
I'm not 100% sure I'm even on the right path to making something usable for "Dates" vs "values". If you know the code to make this graph (I'm assuming it's very basic once you know Python + the libraries), please let me know.
Ok, knew it was a 1 liner. Here is how to do it:
So you start with your value and dates (as strings) array:
dates = ['1/1/2014', '1/2/2014', '1/3/2014', ...]
values = [1423, 4321, 1234, ...]
Then turn your dates which are strings into date objects:
date_objects = [datetime.strptime(date, '%m/%d/%Y').date() for date in dates]
Then just plot.
plt.plot(date_objects, values)
Remember you can use xticks and use whatever the hell you want in place of the values. So, you can do a
xVals = range(len(values))
plot(xVals, values)
xticks(xVals, dates)
This is ok if your dates are linear like you have. Otherwise, you can always get a number which quantifies the dates (number of days from the first day (see the datetime module) and use that for xVals.
so here's my code:
date = []
for time_val in value:
date.append(datetime(*time.strptime(time_val, "%m/%d/%Y")[:3]))
fig, ax = plt.subplots()
ax.plot_date(date, value_list)
date_format = mdates.DateFormatter("%m/%d/%Y")
ax.xaxis.set_major_formatter(date_format)
ax.autoscale_view()
title_name = 'plot based on time'
ax.set_title(title_name)
y_label = 'values'
ax.set_ylabel(y_label)
ax.grid(True)
fig.autofmt_xdate()
plt.show()
I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
In my ignorance I tried this code command:
df.groupby('Letter').hist()
which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"
Any help most appreciated.
I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:
df['N'].hist(by=df['Letter'])
That's a very handy little shortcut for quickly scanning your grouped data!
For future visitors, the product of this call is the following chart:
One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
for group in grouped:
figure()
matplotlib.pyplot.hist(group[1].N)
show()
Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist() it's trying to make a histogram of both columns hence the str error.
This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.
df.reset_index().pivot('index','Letter','N').hist()
The reset_index() is just to shove the current index into a column called index. Then pivot will take your data frame, collect all of the values N for each Letter and make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist() will then produce one histogram per column and you get format the plots as needed.
With recent version of Pandas, you can do
df.N.hist(by=df.Letter)
Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.
I write this answer because I was looking for a way to plot together the histograms of different groups. What follows is not very smart, but it works fine for me. I use Numpy to compute the histogram and Bokeh for plotting. I think it is self-explanatory, but feel free to ask for clarifications and I'll be happy to add details (and write it better).
figures = {
'Transit': figure(title='Transit', x_axis_label='speed [km/h]', y_axis_label='frequency'),
'Driving': figure(title='Driving', x_axis_label='speed [km/h]', y_axis_label='frequency')
}
cols = {'Vienna': 'red', 'Turin': 'blue', 'Rome': 'Orange'}
for gr in df_trips.groupby(['locality', 'means']):
locality = gr[0][0]
means = gr[0][1]
fig = figures[means]
h, b = np.histogram(pd.DataFrame(gr[1]).speed.values)
fig.vbar(x=b[1:], top=h, width=(b[1]-b[0]), legend_label=locality, fill_color=cols[locality], alpha=0.5)
show(gridplot([
[figures['Transit']],
[figures['Driving']],
]))
I find this even easier and faster.
data_df.groupby('Letter').count()['N'].hist(bins=100)