Related
I cannot find a way to plot the grouped data from the follwoing data frame:
Processed Card Transaction ID Transaction amount Error_Occured
Date
2019-01-01 Carte Rouge 217142203412 147924.21 0
2019-01-01 ChinaPay 149207925233 65301.63 1
2019-01-01 Masterkard 766507067450 487356.91 5
2019-01-01 VIZA 145484139636 97774.52 1
2019-01-02 Carte Rouge 510466748547 320951.10 3
I want to create a plot where: x-axis: Date, y-axis: Errors_Occured, points/lines colored by Processed Card. I tried grouping the data frame first and ploting it using pandas plot:
df = df.groupby(['Date','Processed Card']).sum('Error_Occured')
df = df.reset_index()
df.set_index("Date",inplace=True)
df.plot(legend=True)
plt.show()
But I get the plot where Transaction ID is displayed and not the cards:
SEE THE PLOT
You can do something like this:
fig, ax = plt.subplots()
df = pd.DataFrame()
df['Date'] = ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-01-02']
df['Card'] = ['Carte Rouge', 'ChinaPay', 'Masterkard', 'VIZA', 'Carte Rouge']
df['Error_Occured'] = [0,1,5,1,3]
series = dict(list(df.groupby(['Card'])))
for name, s in series.items():
ax.plot(s['Date'], s['Error_Occured'], marker='o', label=name)
plt.legend()
plt.show()
This produces the following with the data provided:
Note that you only want to group by card, not date.
I have tried unsuccessfully to create a bar plot of a time series dataset. I have tried converting the dates to Pandas Datetime objects, Timestamp Objects, primitive strings, floats, and ints. No matter what I do, I get the following error: TypeError: float() argument must be a string or a number, not 'Timestamp' Here are a few minimal examples that produce the error:
Here, the 'Date' object is of type <class 'pandas._libs.tslibs.timestamps.Timestamp'>, so I know why this doesn't work:
import matplotlib.pylab as plt
import matplotlib.dates as mdates
import seaborn as sns
def main():
path = 'Data/AQ+RX Counts.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
weekly_df = df.resample('W').mean().reset_index()
weekly_df['count'] = df['count'].resample('W').sum().reset_index()
sns.barplot(x = 'Date', y='count', data = weekly_df)
plt.show()
main()
I then tried making the dates floats, intending to format them back to dates after, but this still doesnt work:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I also tried making them integers, to no avail:
dates = mdates.datestr2num(weekly_df.Date.astype(str))
dates = dates.astype(int)
dates = pd.Series(dates)
weekly_df['n_dates'] = dates
sns.barplot(x = 'n_dates', y='count', data = weekly_df)
plt.show()
I've tried many other variations, and all produce the same error. I've even compared it to other code and verified that all the types are identical, and the comparison code works fine. I am at a complete loss of where to go from here.
Dataframe:
Date,WSA,WSV,WDV,WSM,SGT,T2M,T10M,DELTA_T,PBAR,SRAD,RH,PM25,AQI,count
2015-01-01,1.0708333333333335,0.8750000000000001,132.95833333333334,3.4708333333333337,35.39166666666667,30.72916666666667,30.625,-0.11666666666666667,738.8249999999998,72.66666666666667,99.75416666666666,24.80833333333333,73.30793131580873,0.0
2015-01-02,1.1086956521739129,0.9391304347826086,148.47826086956522,3.734782608695653,32.46521739130434,34.39130434782609,34.27826086956521,-0.11739130434782602,738.3478260869565,61.39130434782609,100.01304347826084,23.500000000000004,64.15072523318715,4.0
2015-01-03,1.0173913043478258,0.7173913043478259,168.04347826086956,3.773913043478261,42.71739130434783,36.24782608695652,36.160869565217396,-0.09565217391304348,739.4434782608695,49.60869565217392,100.76956521739132,20.460869565217394,55.65271063058384,0.0
2015-01-04,1.0,0.6,159.95833333333334,3.85,49.15,38.8875,38.66666666666666,-0.225,741.5000000000001,31.54166666666667,101.47916666666669,13.012499999999998,46.835258118800965,0.0
2015-01-05,1.0333333333333334,0.4416666666666667,137.0,4.0,57.56666666666666,42.99583333333333,42.94583333333333,-0.04999999999999995,742.5333333333333,44.58333333333334,101.00416666666666,16.654166666666665,52.420271225456766,4.0
2015-01-06,0.7818181818181817,0.5590909090909091,114.72727272727272,3.654545454545455,42.86818181818182,40.7409090909091,41.09545454545454,0.36818181818181817,740.9045454545453,48.27272727272727,100.57727272727274,21.954545454545453,67.31833852518514,6.0
2015-01-07,0.9739130434782608,0.8304347826086954,110.82608695652172,3.956521739130436,30.817391304347833,40.36521739130435,40.59565217391304,0.22173913043478266,739.8652173913043,60.04347826086956,100.19565217391305,24.456521739130434,72.3472505968891,6.0
2015-01-08,0.9833333333333336,0.8250000000000001,156.5,4.208333333333333,32.67083333333333,41.520833333333336,41.36666666666667,-0.12916666666666668,736.35,69.58333333333333,99.95833333333331,22.274999999999995,65.77072473472253,10.0
2015-01-09,0.9583333333333331,0.7291666666666669,133.70833333333334,3.3791666666666664,39.645833333333336,42.279166666666654,42.15833333333333,-0.11666666666666665,735.2041666666665,60.41666666666666,100.04166666666669,19.370833333333334,59.08512936837911,10.0
2015-01-10,0.9666666666666668,0.7583333333333336,164.5,3.675,37.34583333333333,42.96250000000001,42.775,-0.2,734.2875,41.5,100.12083333333337,14.658333333333335,49.31465266245389,0.0
The transformation in the function is converting 'count' from floats to a datetime dtype.
Using the posted sample data
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
path = 'data/test.csv'
df = pd.read_csv(path, parse_dates=['Date'], index_col=['Date'])
# display(df)
WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count
Date
2015-01-01 1.070833 0.875000 132.958333 3.470833 35.391667 30.729167 30.625000 -0.116667 738.825000 72.666667 99.754167 24.808333 73.307931 0.0
2015-01-02 1.108696 0.939130 148.478261 3.734783 32.465217 34.391304 34.278261 -0.117391 738.347826 61.391304 100.013043 23.500000 64.150725 4.0
2015-01-03 1.017391 0.717391 168.043478 3.773913 42.717391 36.247826 36.160870 -0.095652 739.443478 49.608696 100.769565 20.460870 55.652711 0.0
2015-01-04 1.000000 0.600000 159.958333 3.850000 49.150000 38.887500 38.666667 -0.225000 741.500000 31.541667 101.479167 13.012500 46.835258 0.0
2015-01-05 1.033333 0.441667 137.000000 4.000000 57.566667 42.995833 42.945833 -0.050000 742.533333 44.583333 101.004167 16.654167 52.420271 4.0
2015-01-06 0.781818 0.559091 114.727273 3.654545 42.868182 40.740909 41.095455 0.368182 740.904545 48.272727 100.577273 21.954545 67.318339 6.0
2015-01-07 0.973913 0.830435 110.826087 3.956522 30.817391 40.365217 40.595652 0.221739 739.865217 60.043478 100.195652 24.456522 72.347251 6.0
2015-01-08 0.983333 0.825000 156.500000 4.208333 32.670833 41.520833 41.366667 -0.129167 736.350000 69.583333 99.958333 22.275000 65.770725 10.0
2015-01-09 0.958333 0.729167 133.708333 3.379167 39.645833 42.279167 42.158333 -0.116667 735.204167 60.416667 100.041667 19.370833 59.085129 10.0
2015-01-10 0.966667 0.758333 164.500000 3.675000 37.345833 42.962500 42.775000 -0.200000 734.287500 41.500000 100.120833 14.658333 49.314653 0.0
# resample mean
dfr = df.resample('W').mean()
# add the resampled sum to dfr
dfr['mean'] = df['count'].resample('W').sum()
# reset index
dfr = dfr.reset_index()
# display(dfr)
Date WSA WSV WDV WSM SGT T2M T10M DELTA_T PBAR SRAD RH PM25 AQI count mean
0 2015-01-04 1.049230 0.782880 152.359601 3.707382 39.931069 35.063949 34.932699 -0.138678 739.529076 53.802083 100.503986 20.445426 59.986656 1.0 4.0
1 2015-01-11 0.949566 0.690615 136.210282 3.812261 40.152457 41.810743 41.822823 0.015681 738.190794 54.066590 100.316321 19.894900 61.042728 6.0 36.0
# plot dfr
fig, ax = plt.subplots(figsize=(16, 10))
fig = sns.barplot(x='Date', y='count', data=dfr)
# configure the xaxis ticks from datetime to date
x_dates = dfr.Date.dt.strftime('%Y-%m-%d').sort_values().unique()
ax.set_xticklabels(labels=x_dates, rotation=90, ha='right')
plt.show()
I have two data frames that look like:
Temp [Degrees_C] Cond [mS/cm]
yyyy-mm-ddThh:mm:ss.sss
2020-01-28 03:00:59 14.553947 19.301285
2020-01-28 08:00:59 14.501740 19.310037
2020-01-28 13:00:59 14.425415 18.531609
2020-01-28 18:00:59 14.414717 16.155998
...
And this:
CONDUCTIVITY Temp [C]
DATE TIME
2020-01-28 03:00:00 18.240 15.761111
2020-01-28 04:00:00 18.147 15.722222
2020-01-28 05:00:00 17.930 15.722222
2020-01-28 06:00:00 17.873 15.666667
...
I want to create one plot using these two data sets, they should share the same x-axis as the date-time, and two different y-axes (one for temperature and one for conductivity).
However, since the sampling is different for both of them, I'm not sure how to do it.
Any suggestions?
Thank you!!
To get the dates to match between the two datasets, I would use pd.DateOffset. Alternatively, you could use dt.round, but that could lead to some issues if you aren't sure the rounding will work.
To plot two y-axes, you are looking for ax.twinx
import pandas as pd
from io import StringIO
str1 = """
2020-01-28 03:00:59, 14.553947, 19.301285
2020-01-28 08:00:59, 14.501740, 19.310037
2020-01-28 13:00:59, 14.425415, 18.531609
2020-01-28 18:00:59, 14.414717, 16.155998
"""
str2="""
2020-01-28 03:00:00, 18.240, 15.761111
2020-01-28 04:00:00, 18.147, 15.722222
2020-01-28 05:00:00, 17.930, 15.722222
2020-01-28 06:00:00, 17.873, 15.666667
"""
colnames1 = ['date','cond','temp']
colnames2 = ['date','temp', 'cond']
df1 = pd.read_csv(StringIO(str1), header=None, names = colnames1, parse_dates=['date'])
df2 = pd.read_csv(StringIO(str2), header=None, names = colnames2, parse_dates=['date'])
#Offset to even seconds
df1.date = df1.date - pd.DateOffset(seconds=59)
#plot
ax = df1.plot(x='date',y='temp', label='df1', color='k', ls = '--')
#Create second y axis
ax_tw = ax.twinx()
df1.plot(x='date',y='cond', ax = ax_tw, label='df1', color='k', ls='-')
df2.plot(x='date',y='temp', ax= ax, label='df2', color='red', ls='--')
df2.plot(x='date',y='cond', ax = ax_tw,label='df2', color='red', ls ='-')
ax.legend()
which returns:
This matplotlib documentation webpage shows a simple example how to plot with double y axis: https://matplotlib.org/gallery/api/two_scales.html
You wrote you have a data frames, which sounds like you use pandas for data handling. You might simply cast your data to numpy arrays and follow the example.
I'm using matplotlib.pyplot (plt) to plot a graph of temperature against time. I am wanting the xticks to be at 12am and 12pm only, matplot auto picks where the xticks go.
How can I pick exactly 12am and 12pm for each date as the data I'm using doesn't have those data points? Basically I'm wanting an xtick on a point between two data points / on the line between two data points.
Below is a snippet of my data and my function.
Temp UNIX Time Time
5.04 1490562000 2017-03-26 22:00:00
3.21 1490572800 2017-03-27 01:00:00
2.15 1490583600 2017-03-27 04:00:00
1.66 1490594400 2017-03-27 07:00:00
6.92 1490605200 2017-03-27 10:00:00
11.73 1490616000 2017-03-27 13:00:00
13.77 1490626800 2017-03-27 16:00:00
def ploting_graph(self):
ax=plt.gca()
xfmt = md.DateFormatter('%d/%m/%y %p')
ax.xaxis.set_major_formatter(locator)
plt.plot(self.df['Time'],self.df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(self.df['Time']),max(self.df['Time']))
plt.ylim(min(self.df['Temp'])-1,max(self.df['Temp'])+1)
plt.title("Forecast Temperature {}".format(self.location))
plt.savefig('{}_day_forecast_{}.png'.format(self.num_of_days,self.location),dpi=300)
As hopefully you can see I'm looking for an xtick on the 2017-03-27 00:00:00 and 12:00:00.
Any help would be greatly appreciated and even a push in the right direction would be fantastic! Thank you very much in advanced!
John.
You need a locator and a formatter. The locator determines that you only want ticks every 12 hour while the formatter determines how the datetime should look like.
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.dates
dates = ["2017-03-26 22:00:00","2017-03-27 01:00:00","2017-03-27 04:00:00","2017-03-27 07:00:00",
"2017-03-27 10:00:00","2017-03-27 13:00:00","2017-03-27 16:00:00"]
temps = [5.04, 3.21, 2.15,1.66, 6.92, 11.73, 13.77 ]
df = pd.DataFrame({"Time":dates, "Temp":temps})
df["Time"] = pd.to_datetime(df["Time"], format="%Y-%m-%d %H:%M:%S")
fig, ax = plt.subplots()
xfmt = matplotlib.dates.DateFormatter('%d/%m/%y 12%p')
ax.xaxis.set_major_formatter(xfmt)
locator = matplotlib.dates.HourLocator(byhour=[0,12])
ax.xaxis.set_major_locator(locator)
plt.plot(df['Time'],df['Temp'], 'k--^')
plt.xticks(rotation=10)
plt.grid(axis='both',color='r')
plt.ylabel("Temp (DegC)")
plt.xlim(min(df['Time']),max(df['Time']))
plt.ylim(min(df['Temp'])-1,max(df['Temp'])+1)
plt.title("Forecast Temperature")
plt.show()
Try playing with the xticks function on plot. Below is a quick example I tried.
import matplotlib.pyplot as plt
x = [2,3,4,5,6,14,16,18,24,22,20]
y = [1, 4, 9, 6,5,6,7,8,9,5,7]
labels = ['12Am', '12Pm']
plt.plot(x, y, 'ro')
'12,24 are location on x-axis for labels'
plt.xticks((12,24),('12Am', '12Pm'))
plt.margins(0.2)
plt.show()
I have the following data set:
In[55]: usdbrl
Out[56]:
Date Price Open High Low Change STD
0 2016-03-18 3.6128 3.6241 3.6731 3.6051 -0.31 0.069592
1 2016-03-17 3.6241 3.7410 3.7449 3.6020 -3.16 0.069041
2 2016-03-16 3.7422 3.7643 3.8533 3.7302 -0.62 0.068772
3 2016-03-15 3.7656 3.6610 3.7814 3.6528 2.83 0.071474
4 2016-03-14 3.6618 3.5813 3.6631 3.5755 2.23 0.070348
5 2016-03-11 3.5820 3.6204 3.6692 3.5716 -1.09 0.076458
6 2016-03-10 3.6215 3.6835 3.7102 3.6071 -1.72 0.062977
7 2016-03-09 3.6849 3.7543 3.7572 3.6790 -1.88 0.041329
8 2016-03-08 3.7556 3.7826 3.8037 3.7315 -0.72 0.013700
9 2016-03-07 3.7830 3.7573 3.7981 3.7338 0.63 0.000000
I want to plot Price against Date:
But I would like to color the line by a third variable (in my case Date or Change).
Could anybody help with this please?
Thanks.
I've wrote a simple function to map a given property into a color:
import matplotlib.cm as cm
import matplotlib.pyplot as plt
def plot_colourline(x,y,c):
c = cm.jet((c-np.min(c))/(np.max(c)-np.min(c)))
ax = plt.gca()
for i in np.arange(len(x)-1):
ax.plot([x[i],x[i+1]], [y[i],y[i+1]], c=c[i])
return
This function normalizes the desired property and get a color from the jet colormap. You may want to use a different one. Then, get the current axis and plot different segments of your data with a different colour. Because I am doing a for loop, you should avoid using it for a very large data set, however, for normal purposes it is useful.
Consider the following example as a test:
import numpy as np
import matplotlib.pyplot as plt
n = 100
x = 1.*np.arange(n)
y = np.random.rand(n)
prop = x**2
fig = plt.figure(1, figsize=(5,5))
ax = fig.add_subplot(111)
plot_colourline(x,y,prop)
You could color the data points by a third variable, if that would help:
dates = [dt.date() for dt in pd.to_datetime(df.Date)]
plt.scatter(dates, df.Price, c=df.Change, s=100, lw=0)
plt.plot(dates, df.Price)
plt.colorbar()
plt.show()