I am try to take a Pandas dataframe which has 57 columns and plot them on a bar chart with 3 columns per figure. The reason is that the variation in the data and the length of the columns makes is hard to see the data in many of the plots. Plotting multiple columns per subplot isn't an option so each plot has to be visible at the output size. Given the data I have found that 3 subplots per figure looks the best. Here's my script to plot the dataframe:
fig, ax = plt.subplots(nrows=len(df.columns), ncols=1, sharex=True, sharey=True)
yscale = np.ceil(df.abs().select_dtypes(include=[np.number]).values.max())
plt.yscale('symlog')
plt.ylim(-yscale, yscale)
t = list(df.columns.values)
n = 0
for i in df:
df['positive'] = df[i] > 0
df[i].plot.bar(ax=ax[n], rot=0, width=1.0, legend=False, position=0, color=df.positive.map({True: 'g', False: 'r'}))
ax[n].set_title(t[n])
ax[n].axhline(y=0, linewidth=1, color='k')
ax[n].tick_params(which='major', axis='x', length=2)
ax[n].tick_params(which='major', axis='y', length=6)
n += 1
plt.tight_layout()
plt.show()
Would it be easiest just to split the dataframe into several smaller dataframes? My only concern with that is that there will be different numbers of columns per dataframe with different samples.
This seems to be what you want
for i in range(19):
df.iloc[:,i*3:i*3+3].plot.bar(subplots=True,
legend=None)
which gives 19 plots similar to this:
Related
I'm trying to compare a pair of two paired lineplot to visualise how the trend change if the original value are aggregated (by mean or something else).
So basically I have an original dataframe like this (but longer, with about 500 rows):
names
measure_1
measure_2
group
measure_1_g
measure_2_g
name1
2
3
The_first_group
5
7
name2
5
7
The_first_group
5
7
name3
3
4
The_first_group
5
7
name4
8
3
The_second_group
9
5
name5
10
7
The_second_group
9
5
I have tried multiple approaches with matplotlib like:
fig=plt.figure(figsize=(90,10))
ax=fig.add_subplot(111, label="1")
ax2=fig.add_subplot(111, label="2", frame_on=False)
ax.margins(x=0.02)
ax.plot( 'names', 'measure_1', data=df, marker='o', markerfacecolor='xkcd:orange', markersize=7, color='xkcd:orange', linewidth=3, label='measure 1')
ax.plot( 'names', 'measure_2', data=topology_low_norm, marker='o', markerfacecolor='xkcd:red', markersize=7, color='xkcd:red', linewidth=3, label='measure 2')
ax.set_xlabel("original names", color="xkcd:red")
ax.set_ylabel("y original", color="xkcd:orange")
ax.tick_params(axis='x', colors="xkcd:red", labelrotation=90)
ax.tick_params(axis='y', colors="xkcd:orange")
ax2.margins(x=0.02)
ax2.plot( 'group', 'measure_1_g', data=df, marker='^', markerfacecolor='xkcd:aqua', markersize=8, color='xkcd:aqua', linewidth=3, label='Grouped measure 1')
ax2.plot( 'group', 'measure_2_g', data=df, marker='^', markerfacecolor='xkcd:blue', markersize=8, color='xkcd:blue', linewidth=3, label='Grouped measure 2')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.set_xlabel('Groups', color="xkcd:blue")
ax2.set_ylabel('y Groups', color="xkcd:aqua")
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
ax2.tick_params(axis='x', colors="xkcd:blue", labelrotation=90)
ax2.tick_params(axis='y', colors="xkcd:aqua")
handles,labels = [],[]
for ax in fig.axes:
for h,l in zip(*ax.get_legend_handles_labels()):
handles.append(h)
labels.append(l)
plt.legend(handles,labels)
plt.savefig('draft.svg')
plt.show()
The plot it is created but obviously the names and the groups have different scale. For the few first items it is ok but then the markers and lines for the groups measurements are shifted and I need to search manually in the top and bottom x axis to search for name1, name2, name3, and their respective group.
I've tried with no success to plot the data with a multi-index dataframe like the example showed here:
https://stackoverflow.com/a/66121322/14576642
The plot is correctly created but I have some problem:
I didn't found a way to split the two x axis in a top one (the first level of the multi-index [the groups]), and a bottom one (the second one [names]).
I didn't found a way to rotate the external x axis of 90 degree, the internal one is ok if I add the rotation=90 in the code provided in the other answer.
The measures for the group are repeated bar, and If I create a multiindex with 4 levels: group, measure_1_g, measure_2_g, names; then I don't know how to manage the plot to show only one value for the groups that superimpose the content of the group (and not 4 values against 4 values)
Biggest problem: I don't know how to edit the code to adapt it to lineplots. If I mess with the code I obtain many error about the settings and numbers of set_ticks and ticks_labels.
This is an example of my first draft, not the linked one:
The firsts cyan-blue and orange-red peaks correspond to names and groups variable that should be superimposed to compare the individual vs the aggregated values
Ideally the image should be longer because the names x-axis should be follow the spacing of the group x axis.
Any idea?
I have dataframes dataset of two users (userA, userB) as follows:
Each user's dataframe has size is (50,158) where 50 is # of rows (i.e., samples) and 158 is # of columns (i.e, features).
I want to plot each row of the userA in a separated horizontal line with blue color such that the x-axis ranges from 0~158 (i.e., index of feature) and y-axis is the value. So 50 horizontal blue lines represent the 50 rows of userA. Similarly the another red and horizontal represent the 50 rows of userB.
Both 100 lines should be on the same figure.
This is the updated code:
def plot_features(userA, userB):
ax = userA.T.plot(color='b', label='userA')
userB.T.plot(color='r', label='userB', ax =ax)
plt.xlabel('Index of the features', fontsize=18)
plt.ylabel('Values', fontsize=18)
plt.legend(loc='lower left')
ax.set_title('Plotting features of UserA and UserB', fontsize=16)
plt.show()
This is the output:
How I can fix the legend?
How about
ax = dfA.T.plot(color='b')
dfB.T.plot(color='r', ax=ax)
# legend handler
h, l = ax.get_legend_handles_labels()
ax.legend([h[0], h[50]], ['UserA', 'UserB'])
Output (my toy dataframes have 5 rows):
I have a subplot, its x-axis label uses voltages, its csv data column values increase from 0 to 30 and then decrease from 30 to 0.
when I use this code it gives me this plot
ax2.plot(df_raw.index, df_raw.loc[:,"data_column"])
When I use below code I got the plot as as shown below
ax2.plot(df_raw.loc[:,"voltage"], df_raw.loc[:,"data_column"])
What I really want is as shown below
Try to set the label manually:
df = pd.DataFrame({'vol': list(range(101)) + list(range(99,0,-1)),
'val': [0]*10 + [1]*180 +[0]*10})
fig, ax = plt.subplots()
ax.plot(df.index, df.val)
ax.set_xticklabels(df.vol[ax.get_xticks()]
.fillna(0).astype(int))
plt.show()
I have two separate dataframes that I made into histograms and I want to know how I can overlay them so for each category in the x axis the bar is a different color for each dataframe. This is the code I have for the separate bar graphs.
df1.plot.bar(x='brand', y='desc')
df2.groupby(['brand']).count()['desc'].plot(kind='bar')
I tried this code:
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc'].plot(kind='bar')
bins = np.linspace(1, 4)
plt.hist(x, bins, alpha=0.9,normed=1, label='Previous')
plt.hist(y, bins, alpha=0.5, normed=0,label='Current')
plt.legend(loc='upper right')
plt.show()
This code is not overlaying the graphs properly. The problem is dataframe 2 doesn't have numeric values so i need to use the count method. Appreciate the help!
You might have to use axes objects in matplotlib. In simple terms, you create a figure with some axes object associated with it, then you can call hist from it. Here's one way you can do it:
fig, ax = plt.subplots(1, 1)
ax.hist(x, bins, alpha=0.9,normed=1, label='Previous')
ax.hist(y, bins, alpha=0.5, normed=0,label='Current')
ax.legend(loc='upper right')
plt.show()
Make use of seaborn's histogram with several variables. In your case it would be:
import seaborn as sns
previous = df1.plot.bar(x='brand', y='desc')
current= df2.groupby(['brand']).count()['desc']
sns.distplot( previous , color="skyblue", label="previous")
sns.distplot( current , color="red", label="Current")
I suppose this is fairly easy but I tried for a while to get an answer without much success. I want to produce a stacked bar plot for two categories but I have such information in two separate date frames:
This is the code:
first_babies = live[live.birthord == 1] # first dataframe
others = live[live.birthord != 1] # second dataframe
fig = figure()
ax1 = fig.add_subplot(1,1,1)
first_babies.groupby(by=['prglength']).size().plot(
kind='bar', ax=ax1, label='first babies') # first plot
others.groupby(by=['prglength']).size().plot(kind='bar', ax=ax1, color='r',
label='others') #second plot
ax1.legend(loc='best')
ax1.set_xlabel('weeks')
ax1.set_ylabel('frequency')
ax1.set_title('Histogram')
But I want something like this or as I said, a stacked bar plot in order to better distinguish between categories:
I can't use stacked=True because it doesn't work using two different plots and I can't create a new dataframe because first_babies and othersdon't have the same number of elements.
Thanks
First create a new column to distinguish 'first_babies':
live['first_babies'] = live['birthord'].lambda(x: 'first_babies' if x==1 else 'others')
You can unstack the groupby:
grouped = live.groupby(by=['prglength', 'first_babies']).size()
unstacked_count = grouped.size().unstack()
Now you can plot a stacked bar-plot directly:
unstacked_count.plot(kind='bar', stacked=True)