Add Pivot table columns and index as xticks and yticks - python

I have a pivot table created according to this: Color mapping of data on a date vs time plot and plot it with imshow(). I want to use the index and columns of the pivot table as yticks and xticks. The columns in my pivot table are dates and the index are time of the day.
data = pd.DataFrame()
data['Date']=Tgrad_GFAVD_3m2mRot.index.date
data['Time']=Tgrad_GFAVD_3m2mRot.index.strftime("%H")
data['Tgrad']=Tgrad_GFAVD_3m2mRot.values
C = data.pivot(index='Time', columns='Date', values='Tgrad')
print(C.head()):
Date 2016-08-01 2016-08-02 2016-08-03 2016-08-04 2016-08-05 2016-08-06 \
Time
00 -0.841203 -0.541871 -0.042984 -0.867929 -0.790869 -0.940757
01 -0.629176 -0.520935 -0.194655 -0.866815 -0.794878 -0.910690
02 -0.623608 -0.268820 -0.255457 -0.859688 -0.824276 -0.913808
03 -0.615145 -0.008241 -0.463920 -0.909354 -0.811136 -0.878619
04 -0.726949 -0.169488 -0.529621 -0.897773 -0.833408 -0.825612
I plot the pivot table with
fig, ax = plt.subplots(figsize = (16,9))
plt = ax.imshow(C,aspect = 'auto', extent=[0,len(data["Date"]),0,23], origin = "lower")
I tried a couple of things but nothing worked. At the moment my xticks range between 0 and 6552, which is the length of the C.columns object and is set by the extent argument in imshow()
I would like to have the xticks at every first of the month but not by index number but as a datetick in the format "2016-08-01" for example.
I am sure it was just a small thing that has been stopping me the last hour, but now I give up. Do you know how to set the xticks accordingly?

I found the solution myself after trying one more thing.. I created another column in the "data" Dataframe with datenum entries instead of dates
data["datenum"]=mdates.date2num(data["Date"])
Then changed the plot line to:
pl = ax.imshow(C,aspect = 'auto',
extent=[data["datenum"].iloc[0],data["datenum"].iloc[-1],data["Time"].iloc[0],data["Time"].iloc[-1]],
origin = "lower")
So the change of the extent argument provided the datenum values to the plot instead of the index of the date column.
Then with this the following lines worked:
ax.set_yticks(data["Time"]) # sets yticks
ax.xaxis_date() # tells the xaxis that it should expect datetime values
ax.xaxis.set_major_formatter(mdates.DateFormatter("%m/%d") ) # formats the datetime values
fig.autofmt_xdate() # makes it look nice
Best,
Vroni

Related

Ensuring first and last date ticks in x-axis - Matplotlib

Currently I am charting data from some historical point to a point in current time. For example, January 2019 to TODAY (February 2021). However, my matplotlib chart only shows dates from January 2019 to January 2021 on the x-axis (with the last February tick missing) even though the data is charted to today's date on the actual plot.
Is there any way to ensure that the first and last month is always reflected on the x-axis chart? In other words, I would like the x-axis to have the range displayed (inclusive).
Picture of x axis (missing February 2021)
The data charted here is from January 2019 to TODAY (February 12th).
Here is my code for the date format:
fig.autofmt_xdate()
date_format = mdates.DateFormatter("%b-%y")
ax.xaxis.set_major_formatter(date_format)
EDIT: The numbers after each month represent years.
I am not aware of any way to do this other than by creating the ticks from scratch.
In the following example, a list of all first-DatetimeIndex-timestamp-of-the-month is created from the DatetimeIndex of a pandas dataframe, starting from the month of the first date (25th of Jan.) up to the start of the last ongoing month. An appropriate number of ticks is automatically selected by the step variable and the last month is appended and then removed with np.unique when it is a duplicate. The labels are formatted from the tick timestamps.
This solution works for any frequency smaller than yearly:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
start_date = '2019-01-25'
end_date = '2021-02-12'
rng = np.random.default_rng(seed=123) # random number generator
dti = pd.date_range(start_date, end_date, freq='D')
variable = 100 + rng.normal(size=dti.size).cumsum()
df = pd.DataFrame(dict(variable=variable), index=dti)
# Create matplotlib plot
fig, ax = plt.subplots(figsize=(10, 2))
ax.plot(df.index, df.variable)
# Create list of monthly ticks made of timestamp objects
monthly_ticks = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Select appropriate number of ticks and include last month
step = 1
while len(monthly_ticks[::step]) > 10:
step += 1
ticks = np.unique(np.append(monthly_ticks[::step], monthly_ticks[-1]))
# Create tick labels from tick timestamps
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');
As you can see, the first and last months are located at an irregular distance from the neighboring tick.
In case you are plotting a time series with a discontinous date range (e.g. weekend and holidays not included) and you are not using the DatetimeIndex for the x-axis (like this for example: ax.plot(range(df.index.size), df.variable)) so as to avoid gaps with straight lines showing up on short time series and/or very wide plots, then replace the last line of code with this:
plt.xticks([df.index.get_loc(tick) for tick in ticks], labels, rotation=0, ha='center');
Matplotlib uses a limited number of ticks. It just happens that for February 2021 no tick is used. There are two things you could try. First try setting the axis limits to past today with:
ax.set_xlim(start_date, end_date)
What you could also try, is using even more ticks:
ax.set_xticks(np.arange(np.min(x), np.max(x), n_ticks))
Where n_ticks stands for the amount of ticks and x for the values on the x-axis.

seaborn plot misplotting x axis dates from pandas

I am trying to plot a lineplot in seaborn with values for the y axis and dates for the x axis coming from a pandas dataframe.
When I create the lineplot without converting the date column to datetime object it puts the dates in the wrong order. When I do convert the date column to datetime format it gives strange x labels and only shows 5 of the dates.
df = pd.read_csv("dddd.csv")
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values(["date"])
ax = sns.lineplot(y=df["google"],x=df["date"],color="red",data=df)
plt.show()
I want to just plot the data with the x labels being the dates have it in order. Here is some example data.
25-03-2019 -100
26-03-2019 -66.66666667
27-03-2019 -80
28-03-2019 -87.08333333
29-03-2019 -88.88888889
30-03-2019 -86.28526646
31-03-2019 -87.5
01-04-2019 -87.87878788
02-04-2019 -82.92682927
03-04-2019 -84.09090909
04-04-2019 -84.7826087
05-04-2019 -85.71428571
06-04-2019 -81.30677848
07-04-2019 -81.98051948
08-04-2019 -82.14285714
09-04-2019 -78.46153846
10-04-2019 -76.05633803
11-04-2019 -75
12-04-2019 -75
13-04-2019 -80
14-04-2019 -83.33333333
15-04-2019 -83.33333333
16-04-2019 -77.77777778
17-04-2019 -68
18-04-2019 -54.70085471
19-04-2019 -64.70588235
20-04-2019 -66.66666667
Despite this question most certainly being a duplicate, here is some explanation:
Without converting to datetime, your x-axis is unordered, because python doesn't understand the values are dates. Here it probably draws a tick for every point, because it doesn't know how to fit "16-04-2019" on a continuous scale. (Try plotting an array of strings on the x-axis as comparison).
If you do convert to datetime, seaborn and pandas are intelligent enough, to notice your dates form a continuous scale. Because the ticklabels would overlap when drawing all of them, it leaves out some of them and draws ticks in regular intervals instead.
You can influence the location with ax.set_ticks(locations), their style with ax.tick_params(rotation=90, horizontal_alignment='left') or their labels with ax.set_xticklabels(labels). (keywords as example).
You can also use
import matplotlib.dates ad md
locs = md.YearLocator() # Locations for yearly ticks
format = md.MajorFormatter('%B %Y') # Format for labels
to access formatter functions and ax.set_major_locator / formatter to apply them to your plot.
To simply draw a tick for every datapoint, in your case use:
ax.set_xticks(df["Date"])

How to plot both Price and Volume in same Chart

I have a dataframe as mentioned below:
Date,Time,Price,Volume
31/01/2019,09:15:00,10691.50,600
31/01/2019,09:15:01,10709.90,13950
31/01/2019,09:15:02,10701.95,9600
31/01/2019,09:15:03,10704.10,3450
31/01/2019,09:15:04,10700.05,2625
31/01/2019,09:15:05,10700.05,2400
31/01/2019,09:15:06,10698.10,3000
31/01/2019,09:15:07,10699.90,5925
31/01/2019,09:15:08,10699.25,5775
31/01/2019,09:15:09,10700.45,5925
31/01/2019,09:15:10,10700.00,4650
31/01/2019,09:15:11,10699.40,8025
31/01/2019,09:15:12,10698.95,5025
31/01/2019,09:15:13,10698.45,1950
31/01/2019,09:15:14,10696.15,3900
31/01/2019,09:15:15,10697.15,2475
31/01/2019,09:15:16,10697.05,4275
31/01/2019,09:15:17,10696.25,3225
31/01/2019,09:15:18,10696.25,3300
The data frame contains approx 8000 rows. I want plot both price and volume in same chart. (Volume Range: 0 - 8,00,000)
Suppose you want to compare price and volume vs time, try this:
df = pd.read_csv('your_path_here')
df.plot('Time', ['Price', 'Volume'], secondary_y='Price')
edit: x-axis customization
Since you want x-axis customization,try this (this is just a basic example you can follow):
# Create a Datetime column while parsing the csv file
df = pd.read_csv('your_path_here', parse_dates= {'Datetime': ['Date', 'Time']})
Then you need to create two list, one containing the position on the x-axis and the other one the labels.
Say you want labels every 5 seconds (your requests at 30 min is possibile but not with the data you provided)
positions = [p for p in df.Datetime if p.second in range(0, 60, 5)]
labels = [l.strftime('%H:%M:%S') for l in positions]
Then you plot passing the positions and labels lists to set_xticks and set_xticklabels
ax = df.plot('Datetime', ['Price', 'Volume'], secondary_y='Price')
ax.set_xticks(positions)
ax.set_xticklabels(labels)

Formatting X axis labels Pandas time series plot

I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))

In Pandas, generate DateTime index from Multi-Index with years and weeks

I have a DataFrame df with columns saledate (in DateTime, dytpe <M8[ns]) and price (dytpe int64), such if I plot them like
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
I get a scatter plot which looks like below.
Since there are so many points that it is difficult to discern an average trend, I'd like to compute the average sale price per week, and plot that in the same plot. I've tried the following:
dfp_week = dfp.groupby([dfp['saledate'].dt.year, dfp['saledate'].dt.week]).mean()
If I plot the resulting 'price' column like this
plt.figure()
plt.plot(df_week['price'].values/1000.0)
plt.ylabel('Price (1,000 euros)')
I can more clearly discern an increasing trend (see below).
The problem is that I no longer have a time axis to plot this DataSeries in the same plot as the previous figure. The time axis starts like this:
longitude_4pp postal_code_4pp price rooms \
saledate saledate
2014 1 4.873140 1067.5 206250.0 2.5
6 4.954779 1102.0 129000.0 3.0
26 4.938828 1019.0 327500.0 3.0
40 4.896904 1073.0 249000.0 2.0
43 4.938828 1019.0 549000.0 5.0
How could I convert this Multi-Index with years and weeks back to a single DateTime index that I can plot my per-week-averaged data against?
If you group using pd.TimeGrouper you'll keep datetimes in your index.
dfp.groupby(pd.TimeGrouper('W')).mean()
Create a new index:
i = pd.Index(pd.datetime(year, 1, 1) + pd.Timedelta(7 * weeks, unit='d') for year, weeks in df.index)
Then set this new index on the DataFrame:
df.index = i
For the sake of completeness, here are the details of how I implemented the solution suggested by piRSquared:
fig, ax = plt.subplots()
ax.plot_date(dfp['saledate'],dfp['price']/1000.0,'.')
ax.set_xlabel('Date of sale')
ax.set_ylabel('Price (1,000 euros)')
dfp_week = dfp.groupby(pd.TimeGrouper(key='saledate', freq='W')).mean()
plt.plot_date(dfp_week.index, dfp_week['price']/1000.0)
which yields the plot below.

Categories