python plotting from multi-index dataframe - python

I am having trouble plotting the multi-index datafram below,
since I cant use the m04hour.index range value for some reason.
output from m04hour.head()
However, this plot command works fine:
m04hour['consumption (kWh)'].plot(figsize=(12,2))
But this one doesnt:
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index, m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Because the "m04hour.index" is returning the error:
ValueError: setting an array element with a sequence.
So the question is how to refer to the m04hour.index value for setting the plotting x-axis?

You index in this m04hour is not pd.MultiIndex. It is a index with tuples.
First let's convert that list of tuples into a pd.MultiIndex.
df.index = pd.MultiIndex.from_tuples(df.index)
fig,ax = plt.subplots(figsize=(8,4))
ax.plot(m04hour.index.get_level_values(1), m04hour['consumption(kWh)'],c='red',lw=1,label='queens')
Output:

Related

Plotting values above a threshold in Python

Having issues with plotting values above a set threshold using a pandas dataframe.
I have a dataframe that has 21453 rows and 20 columns, and one of the columns is just 1 and 0 values. I'm trying to plot this column using the following code:
lst1 = []
for x in range(0, len(df)):
if(df_smooth['Active'][x] == 1):
lst1.append(df_smooth['Time'][x])
plt.plot(df_smooth['Time'], df_smooth['CH1'])
plt.plot(df_smooth['Time'], lst1)
But get the following errors:
x and y must have same first dimension, but have shapes (21453,) and (9,)
Any suggestions on how to fix this?
The error is probably the result of this line plt.plot(df_smooth['Time'], lst1). While lst1 is a subset of df_smooth[Time], df_smooth['Time'] is the full series.
The solution I would do is to also build a filtered x version for example -
lst_X = []
lst_Y = []
for x in range(0, len(df)):
if(df_smooth['Active'][x] == 1):
lst_X.append(df_smooth['Time'][x])
lst_Y.append(df_smooth['Time'][x])
Another option is to build a sub-dataframe -
sub_df = df_smooth[df_smooth['Active']==1]
plt.plot(sub_df['Time'], sub_df['Time'])
(assuming the correct column as Y column is Time, otherwise just replace it with the correct column)
It seems like you are trying to plot two different data series using the plt.plot() function, this is causing the error because plt.plot() expects both series to have the same length.
You will need to ensure that both data series have the same length before trying to plot them. One way to do this is to create a new list that contains the same number of elements as the df_smooth['Time'] data series, and then fill it with the corresponding values from the lst1 data series.
# Create a new list with the same length as the 'Time' data series
lst2 = [0] * len(df_smooth['Time'])
# Loop through the 'lst1' data series and copy the values to the corresponding
# indices in the 'lst2' data series
for x in range(0, len(lst1)):
lst2[x] = lst1[x]
# Plot the 'Time' and 'lst2' data series using the plt.plot() function
plt.plot(df_smooth['Time'], df_smooth['CH1'])
plt.plot(df_smooth['Time'], lst2)
I think this should work.

Pandas - plotting multiple histograms [duplicate]

I need some guidance in working out how to plot a block of histograms from grouped data in a pandas dataframe. Here's an example to illustrate my question:
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
In my ignorance I tried this code command:
df.groupby('Letter').hist()
which failed with the error message "TypeError: cannot concatenate 'str' and 'float' objects"
Any help most appreciated.
I'm on a roll, just found an even simpler way to do it using the by keyword in the hist method:
df['N'].hist(by=df['Letter'])
That's a very handy little shortcut for quickly scanning your grouped data!
For future visitors, the product of this call is the following chart:
One solution is to use matplotlib histogram directly on each grouped data frame. You can loop through the groups obtained in a loop. Each group is a dataframe. And you can create a histogram for each one.
from pandas import DataFrame
import numpy as np
x = ['A']*300 + ['B']*400 + ['C']*300
y = np.random.randn(1000)
df = DataFrame({'Letter':x, 'N':y})
grouped = df.groupby('Letter')
for group in grouped:
figure()
matplotlib.pyplot.hist(group[1].N)
show()
Your function is failing because the groupby dataframe you end up with has a hierarchical index and two columns (Letter and N) so when you do .hist() it's trying to make a histogram of both columns hence the str error.
This is the default behavior of pandas plotting functions (one plot per column) so if you reshape your data frame so that each letter is a column you will get exactly what you want.
df.reset_index().pivot('index','Letter','N').hist()
The reset_index() is just to shove the current index into a column called index. Then pivot will take your data frame, collect all of the values N for each Letter and make them a column. The resulting data frame as 400 rows (fills missing values with NaN) and three columns (A, B, C). hist() will then produce one histogram per column and you get format the plots as needed.
With recent version of Pandas, you can do
df.N.hist(by=df.Letter)
Just like with the solutions above, the axes will be different for each subplot. I have not solved that one yet.
I write this answer because I was looking for a way to plot together the histograms of different groups. What follows is not very smart, but it works fine for me. I use Numpy to compute the histogram and Bokeh for plotting. I think it is self-explanatory, but feel free to ask for clarifications and I'll be happy to add details (and write it better).
figures = {
'Transit': figure(title='Transit', x_axis_label='speed [km/h]', y_axis_label='frequency'),
'Driving': figure(title='Driving', x_axis_label='speed [km/h]', y_axis_label='frequency')
}
cols = {'Vienna': 'red', 'Turin': 'blue', 'Rome': 'Orange'}
for gr in df_trips.groupby(['locality', 'means']):
locality = gr[0][0]
means = gr[0][1]
fig = figures[means]
h, b = np.histogram(pd.DataFrame(gr[1]).speed.values)
fig.vbar(x=b[1:], top=h, width=(b[1]-b[0]), legend_label=locality, fill_color=cols[locality], alpha=0.5)
show(gridplot([
[figures['Transit']],
[figures['Driving']],
]))
I find this even easier and faster.
data_df.groupby('Letter').count()['N'].hist(bins=100)

How to plot the top 5 values in 'pandas.core.series.Series' in Bokeh?

The first column contains names and the second column contains values.
how can I plot it using bokeh. here is a code below that plots in matlabplot
new_rf = pd.Series(rf.feature_importances_,index=x.columns).sort_values(ascending=False)
new_rf[:5]
such that x will take the variable name and y will take the value
p =figure()
p.vbar(
x = new_rf[:5].index, #here will be the feature name
top = new_rf[:5].values, #here will be feature_importance weight
)
This just gives me an empty plot
Issues with your code:
You create new_rf but use imp_feat_rf. I assume it was just partial renaming when copying code over to SO
vbar() requires width argument. You should see this error in the Python output when you run the code
Working with categorical data requires setting explicit categories for ranges. More details are available at https://docs.bokeh.org/en/latest/docs/user_guide/categorical.html. Note that x_range argument accepts lists but it does not accept pandas indices, so you have to convert the index to a list

Pandas dataframe hist not plotting catgorical variables

i have a dataframe where i want to plot the histograms of each column.
df_play = pd.DataFrame({'a':['cat','dog','cat'],'b':['apple','orange','orange']})
df_play['a'] = df_play['a'].astype('category')
df_play['b'] = df_play['b'].astype('category')
df_play
df_play.hist(layout = (12,10))
However im getting ValueError: num must be 1 <= num <= 0, not 1
When i tried with integers instead of category in the values, it worked fine but i really want the names of the unique string to be in the x-axis.
You can just apply pd.value_counts across columns and plot.
>>> df_play.apply(pd.value_counts).T.stack().plot(kind='bar')
If you want proper subplots or something more intricate, I'd suggest you just iterate with value_counts and create the subplots yourself.
Since there is no natural parameter for binning, perhaps what you want rather than histograms are bar plots of the value counts for each Series? If so, you can achieve that through
df_play['a'].value_counts().plot(kind='bar')
I realized a way to do this is to first specify the fig and axs then loop though the column names of the dataframe that we want to plot the value counts.
fig, axs = plt.subplots(1,len(df_play.columns),figsize(10,6))
for i,x in enumerate(df_play.columns):
df_play[x].value_counts().plot(kind='bar',ax=axs[i])

plotting a scatter plot for list/array in matplotlib

I have wasted so much time looking through the examples given on internet. But I just cant figure out why can't I plot a simple graph of a list of datetime.date objects against a list of integers
appleProd1 appleProd2 appleProd3 appleProd4 ..... 70 Products
apple.com 2010-09-12 2008-01-01 2009-03-02 .......................
I wanted to plot a scatter plot for the launch dates with x axis as dates and y axis as the products. And I do this
plt.subplot(1, 1, 1)
product_list = []
label_ticks = []
date_list = []
yval = 1
for prod, date in df.iteritems(): #date is a datetime.date object
date = pd.to_datetime(date)
product = prod
date_list.append(date)
prod_list.append(prod)
label_ticks.append(yval)
yval+=1
plt.plot_date(date_list, label_ticks)
The last line plt.plot gives me an error saying TypeError: float() argument must be a string or a number . I have also tried converting both the lists to numpy array and use same plt.scatter. Same error. Length of both lists is same. Referred to this site also
http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.scatter
Some dates are Nan also. So converting using date2num is giving error there.
I figured out what the problem was. There were indeed some string values in the date_list so the plt.plot complained. The plt.plot works just fine with a list of datetime.date values. There is no need to convert the datetime.date to any other format.

Categories