Plotting groupby() in python - python

I cannot find a way to plot the grouped data from the follwoing data frame:
Processed Card Transaction ID Transaction amount Error_Occured
Date
2019-01-01 Carte Rouge 217142203412 147924.21 0
2019-01-01 ChinaPay 149207925233 65301.63 1
2019-01-01 Masterkard 766507067450 487356.91 5
2019-01-01 VIZA 145484139636 97774.52 1
2019-01-02 Carte Rouge 510466748547 320951.10 3
I want to create a plot where: x-axis: Date, y-axis: Errors_Occured, points/lines colored by Processed Card. I tried grouping the data frame first and ploting it using pandas plot:
df = df.groupby(['Date','Processed Card']).sum('Error_Occured')
df = df.reset_index()
df.set_index("Date",inplace=True)
df.plot(legend=True)
plt.show()
But I get the plot where Transaction ID is displayed and not the cards:
SEE THE PLOT

You can do something like this:
fig, ax = plt.subplots()
df = pd.DataFrame()
df['Date'] = ['2019-01-01','2019-01-01','2019-01-01','2019-01-01','2019-01-02']
df['Card'] = ['Carte Rouge', 'ChinaPay', 'Masterkard', 'VIZA', 'Carte Rouge']
df['Error_Occured'] = [0,1,5,1,3]
series = dict(list(df.groupby(['Card'])))
for name, s in series.items():
ax.plot(s['Date'], s['Error_Occured'], marker='o', label=name)
plt.legend()
plt.show()
This produces the following with the data provided:
Note that you only want to group by card, not date.

Related

Matplotlib Plot points on an existing line, only by knowing x values

I have 2 dataframes:
'stock' is a dataframe with columns Date and Price.
'events' is a dataframe with columns Date and Text.
My goal is to produce a graph of the stock prices and on the line place dots where the events occur. However, I do not know how to do 'y' value for the events dataframe as I want it to be where it is on the stock dataframe.
I am able to plot the first dataframe fine with:
plt.plot('Date', 'Price', data=stock)
And I try to plot the event dots with:
plt.scatter('created_at', ???, data=events)
However, it is the ??? that I don't know how to set
Assuming Date and created_at are datetime:
stock = pd.DataFrame({'Date':['2021-01-01','2021-02-01','2021-03-01','2021-04-01','2021-05-01'],'Price':[1,5,3,4,10]})
events = pd.DataFrame({'created_at':['2021-02-01','2021-03-01'],'description':['a','b']})
stock.Date = pd.to_datetime(stock.Date)
events.created_at = pd.to_datetime(events.created_at)
Filter stock by events.created_at (or merge) and plot them onto the same ax :
stock_events = stock[stock.Date.isin(events.created_at)]
# or merge on the date columns
# stock_events = stock.merge(events, left_on='Date', right_on='created_at')
ax = stock.plot(x='Date', y='Price')
stock_events.plot.scatter(ax=ax, x='Date', y='Price', label='Event', c='r', s=50)

Plotting a pandas Series using dates and values too squished

I am trying to plot a simple pandas Series object, its something like this:
2018-01-01 10
2018-01-02 90
2018-01-03 79
...
2020-01-01 9
2020-01-02 72
2020-01-03 65
It includes only the first month of each year, so it only contains the month January and all its values through the days.
When i try to plot it
# suppose the name of the series is dates_and_values
dates_and_values.plot()
It returns a plot like this (made using my current data)
It is clearly plotting by year and then the month, so it looks pretty squished and small, since i don't have any other months except January, is there a way to plot it by the year and day so it outputs a better plot to observe the days.
the x-axis is the index of the dataframe
dates are a continuous series, x-axis is continuous
change index to be a string of values, means it it no longer continuous and squishes your graph
have generated some sample data that only has January to demonstrate
import matplotlib.pyplot as plt
cf = pd.tseries.offsets.CustomBusinessDay(weekmask="Sun Mon Tue Wed Thu Fri Sat",
holidays=[d for d in pd.date_range("01-jan-1990",periods=365*50, freq="D")
if d.month!=1])
d = pd.date_range("01-jan-2015", periods=200, freq=cf)
df = pd.DataFrame({"Values":np.random.randint(20,70,len(d))}, index=d)
fig, ax = plt.subplots(2, figsize=[14,6])
df.set_index(df.index.strftime("%Y %d")).plot(ax=ax[0])
df.plot(ax=ax[1])
I suggest that you convert the series to a dataframe and then pivot it to get one column for each year. This lets you plot the data for each year with a separate line, either in the same plot using different colors or in subplots. Here is an example:
import numpy as np # v 1.19.2
import pandas as pd # v 1.2.3
# Create sample series
rng = np.random.default_rng(seed=123) # random number generator
dt = pd.date_range('2018-01-01', '2020-01-31', freq='D')
dt_jan = dt[dt.month == 1]
series = pd.Series(rng.integers(20, 90, size=dt_jan.size), index=dt_jan)
# Convert series to dataframe and pivot it
df_raw = series.to_frame()
df_pivot = df_raw.pivot_table(index=df_raw.index.day, columns=df_raw.index.year)
df = df_pivot.droplevel(axis=1, level=0)
df.head()
# Plot all years together in different colors
ax = df.plot(figsize=(10,4))
ax.set_xlim(1, 31)
ax.legend(frameon=False, bbox_to_anchor=(1, 0.65))
ax.set_xlabel('January', labelpad=10, size=12)
for spine in ['top', 'right']:
ax.spines[spine].set_visible(False)
# Plot years separately
axs = df.plot(subplots=True, color='tab:blue', sharey=True,
figsize=(10,8), legend=None)
for ax in axs:
ax.set_xlim(1, 31)
ax.grid(axis='x', alpha=0.3)
handles, labels = ax.get_legend_handles_labels()
ax.text(28.75, 80, *labels, size=14)
if ax.is_last_row():
ax.set_xlabel('January', labelpad=10, size=12)
ax.figure.subplots_adjust(hspace=0)

What is the best way to plot numerical Y axis, X axis Time series for a categorical variable in Python?

My Data Frame is in below format
Amount Category Transactiondatetime
9445 A16 22-04-2015 19:42
2000 A23 23-04-2015 16:29
1398 A16 02-05-2015 15:17
1995 A7 27-06-2015 13:51
2000 A23 07-08-2015 17:31
Variable Description
Assume category variable as some product categories sold on a website.
Category variable has around 15-20 categories.
Some products were sold 20 times in a year, some were sold 50 and so on for different different amount.
The time series is spread across the year and the data has 6000000 rows.
Aim of my task
I am interested in viewing which category gets most amount during which part of the year. This can be a little messy as the data is huge and there will be some over lapping in the categories on a time series scale.
So what would be the best way to visualize this kind of data - it can be matplotlib, seaborn or bokeh or any other library.
Will appreciate example with code.
Maybe just use a bar graph with amount on the y-axis and time on the x-axis?
import pandas as pd
import matplotlib.pyplot as plt
df = pd.read_csv('something.csv')
df['Transactiondatetime'] = pd.to_datetime(df['Transactiondatetime'], infer_datetime_format=True)
categories = list(set(df['Category'].tolist()))
fig, ax = plt.subplots()
bar_width = 2.0
for category in categories:
cat_df = df[df['Category'] == category]
times = cat_df['Transactiondatetime'].tolist()
values = cat_df['Amount'].tolist()
ax.bar(times, values, bar_width, label=category)
ax.legend()
plt.xlabel('Transaction Date')
plt.ylabel('Amount')
plt.gcf().autofmt_xdate()
plt.show()

Pythng plot axis from table data

I'm learning python and stuck trying to plot data from my table.
Here is piece of my code:
df1 = populationReport.loc[['Time','VIC','NSW','QLD']]
df1 = df1.set_index('Time')
print(df1)
plt.plot(df1)
plt.legend(df1.columns)
plt.ylabel ('Population')
plt.xlabel ('Timeline')
plt.show()
I need the X axis to display information from 'Time' column.
But so far it just displays line numbers in my table.
Attached image displays desired plot but x axis should display not the number of entries but data from 'Time'column
my draft plot
Here is how the table looks like:
VIC NSW QLD
Time
1/12/05 5023203.0 6718023.0 3964175.0
1/3/06 5048207.0 6735528.0 3987653.0
1/6/06 5061266.0 6742690.0 4007992.0
1/9/06 5083593.0 6766133.0 4031580.0
1/12/06 5103965.0 6786160.0 4055845.0
I think you can use to_datetime, if necessary define format or dayfirst parameter:
df1 = populationReport[['Time','VIC','NSW','QLD']]
df1['Time'] = pd.to_datetime(df1['Time'], format='%d/%m/%y')
#alternative
#df1['Time'] = pd.to_datetime(df1['Time'], dayfirst=True)
df1 = df1.set_index('Time')
print (df1)
VIC NSW QLD
Time
2005-12-01 5023203.0 6718023.0 3964175.0
2006-03-01 5048207.0 6735528.0 3987653.0
2006-06-01 5061266.0 6742690.0 4007992.0
2006-09-01 5083593.0 6766133.0 4031580.0
2006-12-01 5103965.0 6786160.0 4055845.0
and then is possible use DataFrame.plot:
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
ax = df1.plot()
ticklabels = df1.index.strftime('%Y-%m-%d')
ax.xaxis.set_major_formatter(ticker.FixedFormatter(ticklabels))
plt.show()

In Pandas, generate DateTime index from Multi-Index with years and weeks

I have a DataFrame df with columns saledate (in DateTime, dytpe <M8[ns]) and price (dytpe int64), such if I plot them like
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
I get a scatter plot which looks like below.
Since there are so many points that it is difficult to discern an average trend, I'd like to compute the average sale price per week, and plot that in the same plot. I've tried the following:
dfp_week = dfp.groupby([dfp['saledate'].dt.year, dfp['saledate'].dt.week]).mean()
If I plot the resulting 'price' column like this
plt.figure()
plt.plot(df_week['price'].values/1000.0)
plt.ylabel('Price (1,000 euros)')
I can more clearly discern an increasing trend (see below).
The problem is that I no longer have a time axis to plot this DataSeries in the same plot as the previous figure. The time axis starts like this:
longitude_4pp postal_code_4pp price rooms \
saledate saledate
2014 1 4.873140 1067.5 206250.0 2.5
6 4.954779 1102.0 129000.0 3.0
26 4.938828 1019.0 327500.0 3.0
40 4.896904 1073.0 249000.0 2.0
43 4.938828 1019.0 549000.0 5.0
How could I convert this Multi-Index with years and weeks back to a single DateTime index that I can plot my per-week-averaged data against?
If you group using pd.TimeGrouper you'll keep datetimes in your index.
dfp.groupby(pd.TimeGrouper('W')).mean()
Create a new index:
i = pd.Index(pd.datetime(year, 1, 1) + pd.Timedelta(7 * weeks, unit='d') for year, weeks in df.index)
Then set this new index on the DataFrame:
df.index = i
For the sake of completeness, here are the details of how I implemented the solution suggested by piRSquared:
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
dfp_week = dfp.groupby(pd.TimeGrouper(key='saledate', freq='W')).mean()
plt.plot_date(dfp_week.index, dfp_week['price']/1000.0)
which yields the plot below.

Categories