Plot multiple lines from one data frame - python

I have all the data I want to plot in one pandas data frame, e.g.:
date flower_color flower_count
0 2017-08-01 blue 1
1 2017-08-01 red 2
2 2017-08-02 blue 5
3 2017-08-02 red 2
I need a few different lines on one plot: x-value should be the date from the first column and y-value should be flower_count, and the y-value should depend on the flower_color given in the second column.
How can I do that without filtering the original df and saving it as a new object first? My only idea was to create a data frame for only red flowers and then specifying it like:
figure.line(x="date", y="flower_count", source=red_flower_ds)
figure.line(x="date", y="flower_count", source=blue_flower_ds)

You can try this
fig, ax = plt.subplots()
for name, group in df.groupby('flower_color'):
group.plot('date', y='flower_count', ax=ax, label=name)

If my understanding is right, you need a plot with two subplots. The X for both subplots are dates, and the Ys are the flower counts for each color?
In this case, you can employ the subplots in pandas visualization.
fig, axes = plt.subplots(2)
z[z.flower_color == 'blue'].plot(x=['date'], y= ['flower_count'],ax=axes[0]).set_ylabel('blue')
z[z.flower_color == 'red'].plot(x=['date'], y= ['flower_count'],ax=axes[1]).set_ylabel('red')
plt.show()
The output will be like:
Hope it helps.

Related

Plotting multiple x-axis lineplot from a multi-index dataframe with subdivisions to match the scale and visualise trends comparison

I'm trying to compare a pair of two paired lineplot to visualise how the trend change if the original value are aggregated (by mean or something else).
So basically I have an original dataframe like this (but longer, with about 500 rows):
names
measure_1
measure_2
group
measure_1_g
measure_2_g
name1
2
3
The_first_group
5
7
name2
5
7
The_first_group
5
7
name3
3
4
The_first_group
5
7
name4
8
3
The_second_group
9
5
name5
10
7
The_second_group
9
5
I have tried multiple approaches with matplotlib like:
fig=plt.figure(figsize=(90,10))
ax=fig.add_subplot(111, label="1")
ax2=fig.add_subplot(111, label="2", frame_on=False)
ax.margins(x=0.02)
ax.plot( 'names', 'measure_1', data=df, marker='o', markerfacecolor='xkcd:orange', markersize=7, color='xkcd:orange', linewidth=3, label='measure 1')
ax.plot( 'names', 'measure_2', data=topology_low_norm, marker='o', markerfacecolor='xkcd:red', markersize=7, color='xkcd:red', linewidth=3, label='measure 2')
ax.set_xlabel("original names", color="xkcd:red")
ax.set_ylabel("y original", color="xkcd:orange")
ax.tick_params(axis='x', colors="xkcd:red", labelrotation=90)
ax.tick_params(axis='y', colors="xkcd:orange")
ax2.margins(x=0.02)
ax2.plot( 'group', 'measure_1_g', data=df, marker='^', markerfacecolor='xkcd:aqua', markersize=8, color='xkcd:aqua', linewidth=3, label='Grouped measure 1')
ax2.plot( 'group', 'measure_2_g', data=df, marker='^', markerfacecolor='xkcd:blue', markersize=8, color='xkcd:blue', linewidth=3, label='Grouped measure 2')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.set_xlabel('Groups', color="xkcd:blue")
ax2.set_ylabel('y Groups', color="xkcd:aqua")
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
ax2.tick_params(axis='x', colors="xkcd:blue", labelrotation=90)
ax2.tick_params(axis='y', colors="xkcd:aqua")
handles,labels = [],[]
for ax in fig.axes:
for h,l in zip(*ax.get_legend_handles_labels()):
handles.append(h)
labels.append(l)
plt.legend(handles,labels)
plt.savefig('draft.svg')
plt.show()
The plot it is created but obviously the names and the groups have different scale. For the few first items it is ok but then the markers and lines for the groups measurements are shifted and I need to search manually in the top and bottom x axis to search for name1, name2, name3, and their respective group.
I've tried with no success to plot the data with a multi-index dataframe like the example showed here:
https://stackoverflow.com/a/66121322/14576642
The plot is correctly created but I have some problem:
I didn't found a way to split the two x axis in a top one (the first level of the multi-index [the groups]), and a bottom one (the second one [names]).
I didn't found a way to rotate the external x axis of 90 degree, the internal one is ok if I add the rotation=90 in the code provided in the other answer.
The measures for the group are repeated bar, and If I create a multiindex with 4 levels: group, measure_1_g, measure_2_g, names; then I don't know how to manage the plot to show only one value for the groups that superimpose the content of the group (and not 4 values against 4 values)
Biggest problem: I don't know how to edit the code to adapt it to lineplots. If I mess with the code I obtain many error about the settings and numbers of set_ticks and ticks_labels.
This is an example of my first draft, not the linked one:
The firsts cyan-blue and orange-red peaks correspond to names and groups variable that should be superimposed to compare the individual vs the aggregated values
Ideally the image should be longer because the names x-axis should be follow the spacing of the group x axis.
Any idea?

Making bar plot of different clusters

I am currently learning K-means, so now I am writing a program in Python to determine different clusters of text that are similar to each other.
So now I got the results for two different clusters (using some fictional words but everything else is the same).
print(dfs) = [ features score
0 America 0.577350
1 new 0.288675
2 president 0.288675
3 Biden 0.288675
, features score
0 Corona 0.593578
1 COVID-19 0.296789
2 research 0.296789
3 health 0.158114]
And dfs is the following type
type(dfs) = list
And the following:
type(dfs[0]) = pandas.core.frame.DataFrame
But how can I easily create bar plots for each cluster in dfs where you see the score attached to each word?
Thanks in advance!
Iterate over the dfs list to access individual dataframes, then, use df.plot.bar with x='features and y='score' as arguments to plot the bar chart relative to that same dataframe. Use the resulting axis from plot function to attach the scores for each bar in the features column. For that, iterate over each patch from the bar plot axis using the x from the rectangle anchor point and the height of the bar as arguments to the annotate function.
...
...
fig, axes = plt.subplots(1, len(dfs))
for num, df in enumerate(dfs):
ax = df.plot.bar(x='features', y='score', ax=axes[num])
for p in ax.patches:
ax.annotate(f'{p.get_height():.4f}', xy=(p.get_x() * 1.01, p.get_height() * 1.01))
axes[num].tick_params(axis='x', labelrotation=30)
axes[num].set_title(f'Dataframe #{num}')
plt.show()

plot dataframs based on rows values in python

I have dataframes dataset of two users (userA, userB) as follows:
Each user's dataframe has size is (50,158) where 50 is # of rows (i.e., samples) and 158 is # of columns (i.e, features).
I want to plot each row of the userA in a separated horizontal line with blue color such that the x-axis ranges from 0~158 (i.e., index of feature) and y-axis is the value. So 50 horizontal blue lines represent the 50 rows of userA. Similarly the another red and horizontal represent the 50 rows of userB.
Both 100 lines should be on the same figure.
This is the updated code:
def plot_features(userA, userB):
ax = userA.T.plot(color='b', label='userA')
userB.T.plot(color='r', label='userB', ax =ax)
plt.xlabel('Index of the features', fontsize=18)
plt.ylabel('Values', fontsize=18)
plt.legend(loc='lower left')
ax.set_title('Plotting features of UserA and UserB', fontsize=16)
plt.show()
This is the output:
How I can fix the legend?
How about
ax = dfA.T.plot(color='b')
dfB.T.plot(color='r', ax=ax)
# legend handler
h, l = ax.get_legend_handles_labels()
ax.legend([h[0], h[50]], ['UserA', 'UserB'])
Output (my toy dataframes have 5 rows):

Plotting Pandas data as an array of bar chart does not honour sharex = True

I have a Pandas dataframe that contains a column containing 'year' data and a column containing 'count' data. There is also a column containing a 'category' variable. Not each category has data for each year. I would like to plot an array of bar charts, one above the other, using a common x axis (year). The code I've written almost works except the x axis is not common for all plots.
The code example is given below. Basically, the code creates an array of axes with sharex=True and then steps through each axis plotting the relevant data from the dataframe.
# Define dataframe
myDF = pd.DataFrame({'year':list(range(2000,2010))+list(range(2001,2008))+list(range(2005,2010)),
'category':['A']*10 + ['B']*7 + ['C']*5,
'count':[2,3,4,3,4,5,4,3,4,5,2,3,4,5,4,5,6,9,8,7,8,6]})
# Plot counts for individual categories in array of bar charts
fig, axarr = plt.subplots(3, figsize = (4,6), sharex = True)
for i in range(0,len(myDF['category'].unique())):
myDF.loc[myDF['category'] == myDF['category'].unique()[i],['year','count']].plot(kind = 'bar',
ax = axarr[i],
x = 'year',
y = 'count',
legend = False,
title = 'Category {0} bar chart'.format(myDF['category'].unique()[i]))
fig.subplots_adjust(hspace=0.5)
plt.show()
A screenshot of the outcome is given below:
I was expecting the Category A bars to extend from 2000 to 2009, Category B bars to extend from 2001 to 2007 and Category C bars to extend from 2005 to 2009. However, it seems that only the first 5 bars of each category are plotted regardless of the value on the x axis. Presumably, the reason only 5 bars are plotted is because the last category only had data for 5 years. A bigger problem is that the data plotted for the other categories is not plotted against the correct year. I've searched for solutions and tried various modifications but nothing seems to work.
Any suggestions to resolve this issue would be very welcome.
Try the following approach:
d = myDF.groupby(['year', 'category'])['count'].sum().unstack()
fig, axarr = plt.subplots(3, figsize = (4,6), sharex=True)
for i, cat in enumerate(d.columns):
d[cat].plot(kind='bar', ax=axarr[i], title='Category {cat} bar chart'.format(cat=cat))
fig.subplots_adjust(hspace=0.5)

Single legend at changing categories (!) in subplots from pandas df

Roughly speaking I need to create one legend for several subplots, which have a changing number of categories = legend entries.
But let me clarify this a bit deeper:
I have a figure with 20 subplots, one for each country within my spatial scope:
fig, ax = plt.subplots(nrows=4, ncols=5, sharex=True, sharey=False, figsize = (32,18))
Within a loop, I do some logic to group the data I need into a normal 2-dimensional pandas DataFrame stats and plot it to each of these 20 axes:
colors = stats.index.to_series().map(type_to_color()).tolist()
stats.T.plot.bar(ax=ax[i,j], stacked=True, legend=False, color=colors)
However, the stats DataFrame is changing its size loop by loop, since not every category applies for each of these countries (i.e. in one country there can only two types, in another there are more than 10).
For this reason I pre-defined a specific color for each type.
So far, I am creating one legend for every subplot within the loop:
ax[i,j].legend(fontsize=9, loc='upper right')
This works, however it blows up the subplots unnecessarily. How can I plot one big legend above/below/beside these plots, since I have already defined the according color.
The given approach here with fig.legend(handles, labels, ...)does not work since the line handles are not available from the pandas plot.
Plotting the legend directly with
plt.legend(loc = 'lower center',bbox_to_anchor = (0,-0.3,1,1),
bbox_transform = plt.gcf().transFigure)
shows only the entries for the very last subplot, which is not sufficient.
Any help is greatly appreciated! Thank you so much!
Edit
For example the DataFrame stats could in one country look like this:
2015 2020 2025 2030 2035 2040
Hydro 29.229082 28.964424 28.528139 27.120194 25.932098 24.675778
Natural Gas 0.926800 0.926800 0.926800 0.926800 0.003600 NaN
Wind 25.799950 25.797550 0.776400 0.520800 0.234400 NaN
Whereas in another country it might look like this:
2015 2020 2025 2030 2035
Bioenergy 0.033690 0.033690 0.030000 NaN NaN
Hard Coal 5.307300 0.065100 0.021000 NaN NaN
Hydro 22.834454 23.930642 23.169014 21.639914 19.623791
Natural Gas 8.378116 8.674121 8.013598 6.755498 5.255450
Solar 5.100403 5.100403 5.100403 5.100403 5.093403
Wind 8.983560 8.974740 8.967240 8.378300 0.195800
Here's how it works to get the legend into an alphabetical order without messing the colors up:
import matplotlib.patches as mpatches
import collections
fig, ax = plt.subplots(nrows=4, ncols=5, sharex=True, sharey=False, figsize = (32,18))
labels_mpatches = collections.OrderedDict()
for a, b in enumerate(countries())
# do some data logic here
colors = stats.index.to_series().map(type_to_color()).tolist()
stats.T.plot.bar(ax=ax[i,j],stacked=True,legend=False,color=colors)
# Pass the legend information into the OrderedDict
stats_handle, stats_labels = ax[i,j].get_legend_handles_labels()
for u, v in enumerate(stats_labels):
if v not in labels_mpatches:
labels_mpatches[v] = mpatches.Patch(color=colors[u], label=v)
# After the loop, do the legend layouting.
labels_mpatches = collections.OrderedDict(sorted(labels_mpatches.items()))
fig.legend(labels_mpatches.values(), labels_mpatches.keys())
# !!! Please Note: In previous versions this here worked, but does not anymore:
# fig.legend(handles=labels_mpatches.values(),labels=labels_mpatches.keys())

Categories