combine different dataframes into one graph - python

I have two dataframes
the first one
price = pd.read_csv('top_50_tickers.csv')
timestamp GME MVIS TSLA AMC
0 2021-07-23 180.36 13.80 643.38 36.99
1 2021-07-22 178.85 14.18 649.26 37.24
2 2021-07-21 185.81 15.03 655.29 40.78
3 2021-07-20 191.18 14.41 660.50 43.09
4 2021-07-19 173.49 13.67 646.22 34.62
the second one
df1 = pd.read_csv('discussion_thread_data.csv')
tickers dt AMC GME MVIS TSLA
0 2021-03-19 21:00:00+06:00 11 13 0 11
1 2021-03-19 22:00:00+06:00 0 0 3 0
2 2021-03-19 23:00:00+06:00 0 5 0 3
3 2021-03-20 00:00:00+06:00 4 0 6 0
I want to put column AMC,GME.. from the first dataframe on top of AMC, GME from another dataframe.
I want to have 4 separate graph with merged graphs on top of each other
Here is what I have but it works only with one ticker
So I assume I need to loop through each column
fig = plt.figure()
ax = fig.add_subplot()
ax2 = fig.add_subplot(frame_on=False)
ax.plot(price.timestamp, price.GME, color="C0")
ax.axes.xaxis.set_visible(False)
ax2.plot(df1.dt, df1.GME, color="C1")
ax2.yaxis.set_label_position("Ticker Occurence")
ax2.yaxis.tick_right()
ax.set_xlabel('Time Frame')
ax.set_ylabel('Price')
Appreciate any help

Put all lines in the same subplot, e.g.:
ax = fig.add_subplot()
ax.plot(price.timestamp, price.GME, color="C0")
ax.plot(df1.dt, df1.GME, color="C1")
etc.

I generally find it more easy using subplots instead of figure e.g
#Get plotting-columns
plot_cols = df1.columns[1:] #assuming the first columns is not to be plotted, but the rest are
fig,axes= plt.subplots(len(plot_cols),1) #rows x columns
#Plot them
for i,col in enumerate(plot_cols): #i = index, col = column-name
axes[i].plot(price.timestamp,price[col])
axes[i].plot(df1.dt,df1[col])
.
.
#do other stuff with axes[i]

Related

Colour by Category in scatterplot

My dataframe looks like this:
date index count weekday_num max_temperature_C
0 2019-04-01 0 1379 0 18
1 2019-04-02 1 1395 1 21
2 2019-04-03 2 1155 2 19
3 2019-04-04 3 342 3 18
4 2019-04-05 4 216 4 14
I would like to plot count vs max_temperature_C and colour by weekday_num
I have tried the below:
#create the scatter plot of trips vs Temp
plt.scatter(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# Label the axis
plt.xlabel('Daily Trip count')
plt.ylabel('Max Temp c')
plt.legend(['weekday_num'])
# Show it!
plt.show()
However I am not sure quite how to get the legend to display all of the colours which correspond to each of the 'weekday_num' ?
Thanks
You can use the automated legend creation like this:
fig, ax = plt.subplots()
scatter = ax.(comb2['count'], comb2['max_temperature_C'], c=comb2['weekday_num'])
# produce a legend with the unique colors from the scatter
legend = ax.legend(*scatter.legend_elements(),
loc="upper right", title="Weekday num")
ax.add_artist(legend)
plt.show()

Multi-year time series charge with shaded range in python

I have these charts that I've created in Excel from dataframes of a structure like such:
so that the chart can be created like this, stacking the 5-Year Range area on top of the Min range (no fill) so that the range area can be shaded. The min/max/range/avg columns all calculate off of 2016-2020.
I know that I can plot lines for multiple years on the same axis by using a date index and applying month labels, but is there a way to replicate the shading of this chart, more specifically if my dataframes are in a simple date index-value format, like so:
Quantity
1/1/2016 6
2/1/2016 4
3/1/2016 1
4/1/2016 10
5/1/2016 7
6/1/2016 10
7/1/2016 10
8/1/2016 2
9/1/2016 1
10/1/2016 2
11/1/2016 3
… …
1/1/2020 4
2/1/2020 8
3/1/2020 3
4/1/2020 5
5/1/2020 8
6/1/2020 6
7/1/2020 6
8/1/2020 7
9/1/2020 8
10/1/2020 5
11/1/2020 4
12/1/2020 3
1/1/2021 9
2/1/2021 7
3/1/2021 7
I haven't been able to find anything similar in the plot libraries.
Two step process
restructure DF so that years are columns, rows indexed by uniform date time
plot using matplotlib
import datetime as dt
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
# straight date as index, quantity as column
d = pd.date_range("1-Jan-2016", "1-Mar-2021", freq="MS")
df = pd.DataFrame({"Quantity":np.random.randint(1, 10, len(d))}, index=d)
# re-structure as multi-index, make year column
# add calculated columns
dfg = (df.set_index(pd.MultiIndex.from_arrays([df.index.map(lambda d: dt.date(dt.date.today().year, d.month, d.day)),
df.index.year], names=["month","year"]))
.unstack("year")
.droplevel(0, axis=1)
.assign(min=lambda dfa: dfa.loc[:,[c for c in dfa.columns if dfa[c].count()==12]].min(axis=1),
max=lambda dfa: dfa.loc[:,[c for c in dfa.columns if dfa[c].count()==12]].max(axis=1),
avg=lambda dfa: dfa.loc[:,[c for c in dfa.columns if dfa[c].count()==12]].mean(axis=1).round(1),
)
)
fig, ax = plt.subplots(1, figsize=[14,4])
# now plot all the parts
ax.fill_between(dfg.index, dfg["min"], dfg["max"], label="5y range", facecolor="oldlace")
ax.plot(dfg.index, dfg[2020], label="2020", c="r")
ax.plot(dfg.index, dfg[2021], label="2021", c="g")
ax.plot(dfg.index, dfg.avg, label="5 yr avg", c="y", ls=(0,(1,2)), lw=3)
# adjust axis
ax.xaxis.set_major_locator(mdates.MonthLocator())
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
ax.legend(loc = 'best')

display last N values in x axis rang as label in matplotlib in Python

In my python script df.Value have set of n values(200). I need last 100 values as my x axis label like last 100-200 index values.
plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
its show 0-200 value in x axis but i need last N values in x axis label
sample data
sample data
sales and time
1 604.802656 13:00:00
2 604.400000 13:01:00
3 604.900024 13:02:00
4 604.099976 13:03:00
5 604.000000 13:04:00
6 604.250000 13:05:00
7 604.400024 13:06:00
8 604.150024 13:07:00
9 604.000000 13:08:00
plt.xticks(np.arange(100),df['Time'].values[100:200])
thid will help you to shows 100 x axis label in last 100 values
try this
plt.xticks(np.arange(100, 200, step=1))
for your case i.e. Time on x-axis you can see this post https://stackoverflow.com/a/16428019/5202279
plt.figure(figsize=(100, 5), dpi=100)
plt.plot(df['Time'], df['sale'], label='sales')
plt.xlabel('Time ')
plt.xticks(np.arrange(100),df.Time[100:200],rotation=45)
plt.ylabel('sales')
plt.title('sales')
plt.legend()
plt.show()
np.ararange(100) indicate 100 X axis value want to show
and df.Time[100:200] get the last 100 string value from data set df.Time
rotate the labe 45 degree
thanks for your support

Stacked bar plots from list of dataframes with groupby command

I wish to create a (2x3) stacked barchart subplot from results using a groupby.size command, let me explain. I have a list of dataframes: list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]. A small example of these df's would be:
... Create Time Location Area Id Beat Priority ... Closed Time
2011-01-01 00:00:00 ST&SAN PABLO AV 1.0 06X 1.0 ... 2011-01-01 00:28:17
2011-01-01 00:01:11 ST&HANNAH ST 1.0 07X 1.0 ... 2011-01-01 01:12:56
.
.
.
(can only add a few columns as the layout messes up)
I'm using a groupby.size command to get a required count of events for these databases, see below:
list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
for i in list_df:
print(i.groupby(['Beat', 'Priority']).size())
print(' ')
Producing:
Beat Priority
01X 1.0 394
2.0 1816
02X 1.0 644
2.0 1970
02Y 1.0 661
2.0 2309
03X 1.0 857
2.0 2962
.
.
.
I wish to identify which is the top 10 TOTALS using the beat column. So for e.g. the totals above are:
Beat Priority Total for Beat
01X 1.0 394
2.0 1816 2210
02Y 1.0 661
2.0 2309 2970
03X 1.0 857
2.0 2962 3819
.
.
.
So far I have used plot over my groupby.size but it hasn't done the collective total as I described above. Check out below:
list_df = [df_2011, df_2012, df_2013, df_2014, df_2015, df_2016]
fig, axes = plt.subplots(2, 3)
for d, i in zip(list_df, range(6)):
ax = axes.ravel()[i];
d.groupby(['Beat', 'Priority']).size().nlargest(10).plot(ax=ax, kind='bar', figsize=(15, 7), stacked=True, legend=True)
ax.set_title(f"Top 10 Beats for {i+ 2011}")
plt.tight_layout()
I wish to have the 2x3 subplot layout, but with stacked barcharts like this one I have done previously:
Thanks in advance. This has been harder than I thought it would be!
The data series need to be the columns, so you probably want
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# create fake input data
ncols = 300
list_df = [pd.DataFrame({'Beat': np.random.choice(['{:02d}X'.format(i) for i in range(15)], ncols),
'Priority': np.random.choice(['1', '2'], ncols),
'othercolumn1': range(ncols),
'othercol2': range(ncols),
'year': [yr] * ncols}) for yr in range(2011, 2017)]
In [22]: print(list_df[0].head(5))
Beat Priority othercolumn1 othercol2 year
0 06X 1 0 0 2011
1 05X 1 1 1 2011
2 04X 1 2 2 2011
3 01X 2 3 3 2011
4 00X 1 4 4 2011
fig, axes = plt.subplots(2, 3)
for i, d in enumerate(list_df):
ax = axes.flatten()[i]
dplot = d[['Beat', 'Priority']].pivot_table(index='Beat', columns='Priority', aggfunc=len)
dplot = (dplot.assign(total=lambda x: x.sum(axis=1))
.sort_values('total', ascending=False)
.head(10)
.drop('total', axis=1))
dplot.plot.bar(ax=ax, figsize=(15, 7), stacked=True, legend=True)

Scatter plotting data from two different data frames in python

I have two different data frames in following format.
dfclean
Out[1]:
obj
0 682
1 101
2 33
dfmalicious
Out[2]:
obj
0 17
1 43
2 8
3 9
4 211
My use-case is to plot a single scatter graph that distinctly shows the obj values from both the dataframes. I am using python for this purpose. I looked at a few examples where two columns of same dataframe were used to plot the data but couldnt replicate it for my use-case. Any help is greatly appreciated.
How to plot two DataFrame on same graph for comparison
To plot multiple column groups in a single axes, repeat plot method specifying target ax
Option 1]
In [2391]: ax = dfclean.reset_index().plot(kind='scatter', x='index', y='obj',
color='Red', label='G1')
In [2392]: dfmalicious.reset_index().plot(kind='scatter', x='index', y='obj',
color='Blue', label='G2', ax=ax)
Out[2392]: <matplotlib.axes._subplots.AxesSubplot at 0x2284e7b8>
Option 2]
In [2399]: dff = dfmalicious.merge(dfclean, right_index=True, left_index=True,
how='outer').reset_index()
In [2406]: dff
Out[2406]:
index obj_x obj_y
0 0 17 682.0
1 1 43 101.0
2 2 8 33.0
3 3 9 NaN
4 4 211 NaN
In [2400]: ax = dff.plot(kind='scatter', x='index', y='obj_x', color='Red', label='G1')
In [2401]: dff.plot(kind='scatter', x='index', y='obj_y', color='Blue', label='G2', ax=ax)
Out[2401]: <matplotlib.axes._subplots.AxesSubplot at 0x11dbe1d0>

Categories