Colour by Category in scatterplot - python

My dataframe looks like this:
date index count weekday_num max_temperature_C
0 2019-04-01 0 1379 0 18
1 2019-04-02 1 1395 1 21
2 2019-04-03 2 1155 2 19
3 2019-04-04 3 342 3 18
4 2019-04-05 4 216 4 14
I would like to plot count vs max_temperature_C and colour by weekday_num
I have tried the below:
#create the scatter plot of trips vs Temp
plt.scatter(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# Label the axis
plt.xlabel('Daily Trip count')
plt.ylabel('Max Temp c')
plt.legend(['weekday_num'])
# Show it!
plt.show()
However I am not sure quite how to get the legend to display all of the colours which correspond to each of the 'weekday_num' ?
Thanks

You can use the automated legend creation like this:
fig, ax = plt.subplots()
scatter = ax.(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# produce a legend with the unique colors from the scatter
legend = ax.legend(*scatter.legend_elements(),
loc="upper right", title="Weekday num")
ax.add_artist(legend)
plt.show()

Related

Bar plot not appearing normally using df.plot.bar()

I have the following code. I am trying to loop through variables (dataframe columns) and create bar plots. I have attached below an example of a graph for the column newerdf['age'].
I believe this should produce 3 bars (one for each option - male (value = 1), female (value = 2), other(value = 3)).
However, the graph below does not seem to show this.
I would be so grateful for a helping hand as to where I am going wrong!
listedvariables = ['age','gender-quantised','hours_of_sleep','frequency_of_alarm_usage','nap_duration_mins','frequency_of_naps','takes_naps_yes/no','highest_education_level_acheived','hours_exercise_per_week_in_last_6_months','drink_alcohol_yes/no','drink_caffeine_yes/no','hours_exercise_per_week','hours_of_phone_use_per_week','video_game_phone/tablet_hours_per_week','video_game_all_devices_hours_per_week']
for i in range(0,len(listedvariables)):
fig = newerdf[[listedvariables[i]]].plot.bar(figsize=(30,20))
fig.tick_params(axis='x',labelsize=40)
fig.tick_params(axis='y',labelsize=40)
plt.tight_layout()
newerdf['age']
age
0 2
1 2
2 4
3 3
5 2
... ...
911 2
912 1
913 2
914 3
915 2
The data are not grouped into categories yet, so a value count is needed before calling the plotting method:
for var in listedvariables:
ax = newerdf[var].value_counts().plot.bar(figsize=(30,20))
ax.tick_params(axis='x', labelsize=40)
ax.tick_params(axis='y', labelsize=40)
plt.tight_layout()
plt.show()

combine different dataframes into one graph

I have two dataframes
the first one
price = pd.read_csv('top_50_tickers.csv')
timestamp GME MVIS TSLA AMC
0 2021-07-23 180.36 13.80 643.38 36.99
1 2021-07-22 178.85 14.18 649.26 37.24
2 2021-07-21 185.81 15.03 655.29 40.78
3 2021-07-20 191.18 14.41 660.50 43.09
4 2021-07-19 173.49 13.67 646.22 34.62
the second one
df1 = pd.read_csv('discussion_thread_data.csv')
tickers dt AMC GME MVIS TSLA
0 2021-03-19 21:00:00+06:00 11 13 0 11
1 2021-03-19 22:00:00+06:00 0 0 3 0
2 2021-03-19 23:00:00+06:00 0 5 0 3
3 2021-03-20 00:00:00+06:00 4 0 6 0
I want to put column AMC,GME.. from the first dataframe on top of AMC, GME from another dataframe.
I want to have 4 separate graph with merged graphs on top of each other
Here is what I have but it works only with one ticker
So I assume I need to loop through each column
fig = plt.figure()
ax = fig.add_subplot()
ax2 = fig.add_subplot(frame_on=False)
ax.plot(price.timestamp, price.GME, color="C0")
ax.axes.xaxis.set_visible(False)
ax2.plot(df1.dt, df1.GME, color="C1")
ax2.yaxis.set_label_position("Ticker Occurence")
ax2.yaxis.tick_right()
ax.set_xlabel('Time Frame')
ax.set_ylabel('Price')
Appreciate any help
Put all lines in the same subplot, e.g.:
ax = fig.add_subplot()
ax.plot(price.timestamp, price.GME, color="C0")
ax.plot(df1.dt, df1.GME, color="C1")
etc.
I generally find it more easy using subplots instead of figure e.g
#Get plotting-columns
plot_cols = df1.columns[1:] #assuming the first columns is not to be plotted, but the rest are
fig,axes= plt.subplots(len(plot_cols),1) #rows x columns
#Plot them
for i,col in enumerate(plot_cols): #i = index, col = column-name
axes[i].plot(price.timestamp,price[col])
axes[i].plot(df1.dt,df1[col])
.
.
#do other stuff with axes[i]

legends not print fully when multiple plots are plotted on same figure

I have the code as below to plot multiple plots on the same figure
fig, ax = plt.subplots(figsize=(25, 10))
def wl_ratioplot(wavelength1,wavelength2, dataframe, x1=0.1,x2=1.5,y1=-500,y2=25000):
a=dataframe[['asphalt_index','layer_thickness',wavelength1,wavelength2]].copy()
sns.scatterplot(x=a[wavelength1]/a[wavelength2],y=a['layer_thickness'],data=a)
ax.set_xlim(x1,x2)
ax.set_ylim(y1,y2)
leg = "{} vs {}".format(wavelength1,wavelength2)
print(leg) #this line is only to see the variable legend has the proper content
ax.legend(leg)
wl_ratioplot(wave_lengths[2],wave_lengths[0],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[0],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[3],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[3],wave_lengths[0],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[2],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
I get the plot as the below pic where the legend seems to be first 5 letters separately even though the variable legend has the right content
There was another similar question & the solution was to put a square bracket to the variable legend. I tried this with the code as below.
fig, ax = plt.subplots(figsize=(25, 10))
def wl_ratioplot(wavelength1,wavelength2, dataframe, x1=0.1,x2=1.5,y1=-500,y2=25000):
a=dataframe[['asphalt_index','layer_thickness',wavelength1,wavelength2]].copy()
sns.scatterplot(x=a[wavelength1]/a[wavelength2],y=a['layer_thickness'],data=a)
ax.set_xlim(x1,x2)
ax.set_ylim(y1,y2)
leg = "{} vs {}".format(wavelength1,wavelength2)
print(leg)#this line is only to see the variable legend has the proper content
ax.legend([leg])
wl_ratioplot(wave_lengths[2],wave_lengths[0],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[0],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[3],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[3],wave_lengths[0],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
wl_ratioplot(wave_lengths[2],wave_lengths[1],dataframe=train_df_wo_outliers,x1=-.1,x2=3)
Now I get the full legend but only the first legend is shown as the pic below
Can someone let me know how to get the full legend for all the plots? Thanks.
dummy data (the plot in pic will NOT match)
14nm 15nm 16nm 17nm 18nm 19nm layer_thickness
1 2 3 4 5 6 0
1 2 3 4 5 6 0
3 5 7 9 11 13 5700
1 2 3 4 5 6 0
3 5 7 9 11 13 8600
1 2 3 4 5 6 0
3 5 7 9 11 13 5000
1 2 3 4 5 6 0
45 55 65 75 85 95 100
1 2 3 4 5 6 0
8 15 22 29 36 43 16600
wave_lengths=['15nm','16nm','14nm','18nm']
Answer Update
Based on answer from Quang Hoang. The output pics using scatter plot from matplotlib & sns.scatterplot
With plt it is pretty natural:
def wl_ratioplot(wavelength1,wavelength2, dataframe,
x1=0.1,x2=1.5,y1=-500,y2=25000,
ax=None):
leg = "{} vs {}".format(wavelength1,wavelength2)
# set the label here, and let plt deal with it
# also, you don't need to copy the dataframe:
ax.scatter(x=dataframe[wavelength1]/dataframe[wavelength2],
y=dataframe['layer_thickness'],label=leg)
ax.set_xlim(x1,x2)
ax.set_ylim(y1,y2)
fig, ax = plt.subplots(figsize=(25, 10))
wl_ratioplot(wave_lengths[2],wave_lengths[0],dataframe=df,x1=-.1,x2=3, ax=ax)
wl_ratioplot(wave_lengths[0],wave_lengths[1],dataframe=df,x1=-.1,x2=3, ax=ax)
wl_ratioplot(wave_lengths[3],wave_lengths[1],dataframe=df,x1=-.1,x2=3, ax=ax)
wl_ratioplot(wave_lengths[3],wave_lengths[0],dataframe=df,x1=-.1,x2=3, ax=ax)
wl_ratioplot(wave_lengths[2],wave_lengths[1],dataframe=df,x1=-.1,x2=3, ax=ax)
ax.legend()
Output:
every time you call the function wl_ratioplot the legend is being reset the final value. use a array to store all the legends then access it all through a loop.
ax.legend([leg]) #it is resetting the legend after each call.
use a legends = [];
legends.append([leg])
after all function calls, draw the legend differently
ax.legend(legends)

labelling bins in each subplots of an histogram chart

I have a dataframe,df with 29 rows by 24 columns dimension
Index 0.0 5.0 34.0 ... 22.0
2017-08-03 00:00:00 10 0 10 0
2017-08-04 00:00:00 20 60 1470 20
2017-08-05 00:00:00 0 58 0 24
2017-08-06 00:00:00 0 0 480 24
2017-09-07 00:00:00 0 0 0 25
: : : : :
: : : : :
2017-09-30 00:00:00
I intend to label bins for each subplot representing a column in the histogram chart.I have been able to draw the histogram in each subplot for each column using this code
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
#Initialize the figure
plt.style.use('seaborn-darkgrid')
df.hist(ax = ax)
However, the labels of the bins of each subplot are far apart and bin labels are not explicitly specified by ranges on the x-axis which is difficult to interpret. I have looked at
Aligning bins to xticks in plt.hist but it doesnt explicitly solve for labelling bins when subplots are concerned. Any help will be great...
I have also tried this but i get ValueError: too many values to unpack (expected 2)
x=[0,40,80,120,160,200,240,280,320]
fig = plt.figure(figsize = (15,20))
ax = fig.gca()
# Initialize the figure
plt.style.use('seaborn-darkgrid')
n,bins= plt.hist(df,bins= x)
#labels & axes
plt.locator_params(nbins=8, axis='x')
plt.ticklabel_format(style='sci', axis='x', scilimits=(0,0))
plt.title('Daily occurrence',fontsize=16)
plt.xlabel('Number of occurrence',fontsize=12)
plt.ylabel('Frequency',fontsize=12)
plt.xticks(x)
plt.xlim(0,320)

Scatter plotting data from two different data frames in python

I have two different data frames in following format.
dfclean
Out[1]:
obj
0 682
1 101
2 33
dfmalicious
Out[2]:
obj
0 17
1 43
2 8
3 9
4 211
My use-case is to plot a single scatter graph that distinctly shows the obj values from both the dataframes. I am using python for this purpose. I looked at a few examples where two columns of same dataframe were used to plot the data but couldnt replicate it for my use-case. Any help is greatly appreciated.
How to plot two DataFrame on same graph for comparison
To plot multiple column groups in a single axes, repeat plot method specifying target ax
Option 1]
In [2391]: ax = dfclean.reset_index().plot(kind='scatter', x='index', y='obj',
color='Red', label='G1')
In [2392]: dfmalicious.reset_index().plot(kind='scatter', x='index', y='obj',
color='Blue', label='G2', ax=ax)
Out[2392]: <matplotlib.axes._subplots.AxesSubplot at 0x2284e7b8>
Option 2]
In [2399]: dff = dfmalicious.merge(dfclean, right_index=True, left_index=True,
how='outer').reset_index()
In [2406]: dff
Out[2406]:
index obj_x obj_y
0 0 17 682.0
1 1 43 101.0
2 2 8 33.0
3 3 9 NaN
4 4 211 NaN
In [2400]: ax = dff.plot(kind='scatter', x='index', y='obj_x', color='Red', label='G1')
In [2401]: dff.plot(kind='scatter', x='index', y='obj_y', color='Blue', label='G2', ax=ax)
Out[2401]: <matplotlib.axes._subplots.AxesSubplot at 0x11dbe1d0>

Categories