plot dataframs based on rows values in python - python

I have dataframes dataset of two users (userA, userB) as follows:
Each user's dataframe has size is (50,158) where 50 is # of rows (i.e., samples) and 158 is # of columns (i.e, features).
I want to plot each row of the userA in a separated horizontal line with blue color such that the x-axis ranges from 0~158 (i.e., index of feature) and y-axis is the value. So 50 horizontal blue lines represent the 50 rows of userA. Similarly the another red and horizontal represent the 50 rows of userB.
Both 100 lines should be on the same figure.
This is the updated code:
def plot_features(userA, userB):
ax = userA.T.plot(color='b', label='userA')
userB.T.plot(color='r', label='userB', ax =ax)
plt.xlabel('Index of the features', fontsize=18)
plt.ylabel('Values', fontsize=18)
plt.legend(loc='lower left')
ax.set_title('Plotting features of UserA and UserB', fontsize=16)
plt.show()
This is the output:
How I can fix the legend?

How about
ax = dfA.T.plot(color='b')
dfB.T.plot(color='r', ax=ax)
# legend handler
h, l = ax.get_legend_handles_labels()
ax.legend([h[0], h[50]], ['UserA', 'UserB'])
Output (my toy dataframes have 5 rows):

Related

Matplotlib: Overlapping labels in pie chart

I have to make a piechart for the following data:
However, because the larger numbers are in the hundreds while the smaller numbers are lesser than 1, the labels for the graph end up illegible due to overlapping. For example, this is the graph for Singapore:
I have tried decreasing the font size and increasing the graph size but because it overlaps so much, doing so doesn't really help at all. Here are the necessary codes for my graph:
import matplotlib.pyplot as plt
plt.pie(consumption["Singapore"], labels = consumption.index)
fig = plt.gcf()
fig.set_size_inches(8,8)
ax = plt.gca()
handles, labels = ax.get_legend_handles_labels()
labels, handles = zip(*sorted(zip(labels, handles), key=lambda t: t[0], reverse=True))
plt.show()
Is there any way to solve this issue?
The problem of overlapping label characters cannot be completely solved by programming. If you're dealing with your challenges only, first group them to aggregate the number of labels. The grouped data frames are targeted for the pie chart. However, it still overlaps, so get the current label position and change the position of the overlapping label.
new_df = consumption.groupby('Singapore')['Entity'].apply(list).reset_index()
new_df['Entity'] = new_df['Entity'].apply(lambda x: ','.join(x))
new_df
Singapore Entity
0 0.000000 Biofuels,Wind,Hydro,Nuclear
1 0.679398 Other
2 0.728067 Solar
3 5.463305 Coal
4 125.983605 Gas
5 815.027694 Oil
import matplotlib.pyplot as plt
fig, ax = plt.subplots(figsize=(8,8))
wedges, texts = ax.pie(new_df["Singapore"], wedgeprops=dict(width=0.5), startangle=0, labels=new_df.Entity)
# print(wedges, texts)
texts[0].set_position((1.1,0.0))
texts[1].set_position((1.95,0.0))
texts[2].set_position((2.15,0.0))
plt.legend()
plt.show()

Plotting multiple x-axis lineplot from a multi-index dataframe with subdivisions to match the scale and visualise trends comparison

I'm trying to compare a pair of two paired lineplot to visualise how the trend change if the original value are aggregated (by mean or something else).
So basically I have an original dataframe like this (but longer, with about 500 rows):
names
measure_1
measure_2
group
measure_1_g
measure_2_g
name1
2
3
The_first_group
5
7
name2
5
7
The_first_group
5
7
name3
3
4
The_first_group
5
7
name4
8
3
The_second_group
9
5
name5
10
7
The_second_group
9
5
I have tried multiple approaches with matplotlib like:
fig=plt.figure(figsize=(90,10))
ax=fig.add_subplot(111, label="1")
ax2=fig.add_subplot(111, label="2", frame_on=False)
ax.margins(x=0.02)
ax.plot( 'names', 'measure_1', data=df, marker='o', markerfacecolor='xkcd:orange', markersize=7, color='xkcd:orange', linewidth=3, label='measure 1')
ax.plot( 'names', 'measure_2', data=topology_low_norm, marker='o', markerfacecolor='xkcd:red', markersize=7, color='xkcd:red', linewidth=3, label='measure 2')
ax.set_xlabel("original names", color="xkcd:red")
ax.set_ylabel("y original", color="xkcd:orange")
ax.tick_params(axis='x', colors="xkcd:red", labelrotation=90)
ax.tick_params(axis='y', colors="xkcd:orange")
ax2.margins(x=0.02)
ax2.plot( 'group', 'measure_1_g', data=df, marker='^', markerfacecolor='xkcd:aqua', markersize=8, color='xkcd:aqua', linewidth=3, label='Grouped measure 1')
ax2.plot( 'group', 'measure_2_g', data=df, marker='^', markerfacecolor='xkcd:blue', markersize=8, color='xkcd:blue', linewidth=3, label='Grouped measure 2')
ax2.xaxis.tick_top()
ax2.yaxis.tick_right()
ax2.set_xlabel('Groups', color="xkcd:blue")
ax2.set_ylabel('y Groups', color="xkcd:aqua")
ax2.xaxis.set_label_position('top')
ax2.yaxis.set_label_position('right')
ax2.tick_params(axis='x', colors="xkcd:blue", labelrotation=90)
ax2.tick_params(axis='y', colors="xkcd:aqua")
handles,labels = [],[]
for ax in fig.axes:
for h,l in zip(*ax.get_legend_handles_labels()):
handles.append(h)
labels.append(l)
plt.legend(handles,labels)
plt.savefig('draft.svg')
plt.show()
The plot it is created but obviously the names and the groups have different scale. For the few first items it is ok but then the markers and lines for the groups measurements are shifted and I need to search manually in the top and bottom x axis to search for name1, name2, name3, and their respective group.
I've tried with no success to plot the data with a multi-index dataframe like the example showed here:
https://stackoverflow.com/a/66121322/14576642
The plot is correctly created but I have some problem:
I didn't found a way to split the two x axis in a top one (the first level of the multi-index [the groups]), and a bottom one (the second one [names]).
I didn't found a way to rotate the external x axis of 90 degree, the internal one is ok if I add the rotation=90 in the code provided in the other answer.
The measures for the group are repeated bar, and If I create a multiindex with 4 levels: group, measure_1_g, measure_2_g, names; then I don't know how to manage the plot to show only one value for the groups that superimpose the content of the group (and not 4 values against 4 values)
Biggest problem: I don't know how to edit the code to adapt it to lineplots. If I mess with the code I obtain many error about the settings and numbers of set_ticks and ticks_labels.
This is an example of my first draft, not the linked one:
The firsts cyan-blue and orange-red peaks correspond to names and groups variable that should be superimposed to compare the individual vs the aggregated values
Ideally the image should be longer because the names x-axis should be follow the spacing of the group x axis.
Any idea?

How do I plot Pandas Dataframe columns on separate figures?

I am try to take a Pandas dataframe which has 57 columns and plot them on a bar chart with 3 columns per figure. The reason is that the variation in the data and the length of the columns makes is hard to see the data in many of the plots. Plotting multiple columns per subplot isn't an option so each plot has to be visible at the output size. Given the data I have found that 3 subplots per figure looks the best. Here's my script to plot the dataframe:
fig, ax = plt.subplots(nrows=len(df.columns), ncols=1, sharex=True, sharey=True)
yscale = np.ceil(df.abs().select_dtypes(include=[np.number]).values.max())
plt.yscale('symlog')
plt.ylim(-yscale, yscale)
t = list(df.columns.values)
n = 0
for i in df:
df['positive'] = df[i] > 0
df[i].plot.bar(ax=ax[n], rot=0, width=1.0, legend=False, position=0, color=df.positive.map({True: 'g', False: 'r'}))
ax[n].set_title(t[n])
ax[n].axhline(y=0, linewidth=1, color='k')
ax[n].tick_params(which='major', axis='x', length=2)
ax[n].tick_params(which='major', axis='y', length=6)
n += 1
plt.tight_layout()
plt.show()
Would it be easiest just to split the dataframe into several smaller dataframes? My only concern with that is that there will be different numbers of columns per dataframe with different samples.
This seems to be what you want
for i in range(19):
df.iloc[:,i*3:i*3+3].plot.bar(subplots=True,
legend=None)
which gives 19 plots similar to this:

Python, connect two data points at the beginning and end of a series

I would really like to know how I can plot the mean of two points on a chart with Python. I have stock data with 200 data points, and I want to take the mean of the first 20 points and the mean of the last 20 points, and then plot a line connecting those two points. I do not want any of the data points between those two to be taken into account.
my entire program is as such
stock = web.get_data_yahoo('clh.ax', '10/01/2017', interval='d')
stock['ema']=stock['Adj Close'].ewm(span=100,min_periods=0).mean()
stock['std']=stock['Adj Close'].rolling(window = 20,min_periods=0).std()
# bollinger bands
stock['close 20 day mean'] = stock['Close'].rolling(20,min_periods=0).mean()
# upper band
stock['upper'] = stock['close 20 day mean'] + 2 * (stock['Close'].rolling(20, min_periods=0).std())
# lower band
stock['lower'] = stock['close 20 day mean'] - 2 * (stock['Close'].rolling(20, min_periods=0).std())
# end bollinger bands
fig,axes = plt.subplots(nrows=3, ncols =1, figsize=(10,6))
axes[0].plot(stock['Close'], color='red')
axes[0].plot(stock['ema'], color='blue')
axes[0].plot(stock['close 20 day mean'], color='black')
axes[0].plot(stock['upper'], color='black')
axes[0].plot(stock['lower'], color='black')
axes[1].plot(stock['Volume'],color='purple')
axes[2].plot(stock['std'], color='black')
Not 100% sure i understood the question right, but:
a) Take the mean of the first 20 points,
b) Take the mean of the last 20 points.
c) Plots a line between those two values.
%matplotlib inline
import matplotlib.pyplot as plt
plt.plot([stock["Close"].iloc[:20].mean(), stock["Close"].iloc[-20:].mean()])
This plots:

Plot multiple lines from one data frame

I have all the data I want to plot in one pandas data frame, e.g.:
date flower_color flower_count
0 2017-08-01 blue 1
1 2017-08-01 red 2
2 2017-08-02 blue 5
3 2017-08-02 red 2
I need a few different lines on one plot: x-value should be the date from the first column and y-value should be flower_count, and the y-value should depend on the flower_color given in the second column.
How can I do that without filtering the original df and saving it as a new object first? My only idea was to create a data frame for only red flowers and then specifying it like:
figure.line(x="date", y="flower_count", source=red_flower_ds)
figure.line(x="date", y="flower_count", source=blue_flower_ds)
You can try this
fig, ax = plt.subplots()
for name, group in df.groupby('flower_color'):
group.plot('date', y='flower_count', ax=ax, label=name)
If my understanding is right, you need a plot with two subplots. The X for both subplots are dates, and the Ys are the flower counts for each color?
In this case, you can employ the subplots in pandas visualization.
fig, axes = plt.subplots(2)
z[z.flower_color == 'blue'].plot(x=['date'], y= ['flower_count'],ax=axes[0]).set_ylabel('blue')
z[z.flower_color == 'red'].plot(x=['date'], y= ['flower_count'],ax=axes[1]).set_ylabel('red')
plt.show()
The output will be like:
Hope it helps.

Categories