I am trying to plot a multiple time series dataframe in pandas. The time series is a 1 year daily points of length 365. The figure is coming alright but I want to suppress the year tick showing on the x axis.
I want to suppress the 1950 label showing in the left corner of x axis. Can anybody suggest something on this? My code
dates = pandas.date_range('1950-01-01', '1950-12-31', freq='D')
data_to_plot12 = pandas.DataFrame(data=data_array, # values
index=homo_regions) # 1st column as index
dataframe1 = pandas.DataFrame.transpose(data_to_plot12)
dataframe1.index = dates
ax = dataframe1.plot(lw=1.5, marker='.', markersize=2, title='PRECT time series PI Slb Ocn CNTRL 60 years')
ax.set(xlabel="Months", ylabel="PRECT (mm/day)")
fig_name = 'dataframe1.pdf'
plt.savefig(fig_name)
You should be able to specify the xaxis major formatter like so
import matplotlib.dates as mdates
...
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b'))
Related
I'm trying to plot a graph of a time series which has dates from 1959 to 2019 including months, and I when I try plotting this time series I'm getting a clustered x-axis where the dates are not showing properly. How is it possible to remove the months and get only the years on the x-axis so it wont be as clustered and it would show the years properly?
fig,ax = plt.subplots(2,1)
ax[0].hist(pca_function(sd_Data))
ax[0].set_ylabel ('Frequency')
ax[1].plot(pca_function(sd_Data))
ax[1].set_xlabel ('Years')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
# fig.savefig('factor1959.pdf')
pca_function(sd_Data)
comp_0
sasdate
1959-01 -0.418150
1959-02 1.341654
1959-03 1.684372
1959-04 1.981473
1959-05 1.242232
...
2019-08 -0.075270
2019-09 -0.402110
2019-10 -0.609002
2019-11 0.320586
2019-12 -0.303515
[732 rows x 1 columns]
From what I see, you do have years on your second subplot, they are just overlapped because there are to many of them placed horizontally. Try to increase figsize, and rotate ticks:
# Builds an example dataframe.
df = pd.DataFrame(columns=['Years', 'Frequency'])
df['Years'] = pd.date_range(start='1/1/1959', end='1/1/2023', freq='M')
df['Frequency'] = np.random.normal(0, 1, size=(df.shape[0]))
fig, ax = plt.subplots(2,1, figsize=(20, 5))
ax[0].hist(df.Frequency)
ax[0].set_ylabel ('Frequency')
ax[1].plot(df.Years, df.Frequency)
ax[1].set_xlabel('Years')
for tick in ax[0].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
for tick in ax[1].get_xticklabels():
tick.set_rotation(45)
tick.set_ha('right')
fig.suptitle('Histogram and Time series of Plot Factor')
plt.tight_layout()
p.s. if the x-labels still overlap, try to increase your step size.
First off, you need to store the result of the call to pca_function into a variable. E.g. called result_pca_func. That way, the calculations (and possibly side effects or different randomization) are only done once.
Second, the dates should be converted to a datetime format. For example using pd.to_datetime(). That way, matplotlib can automatically put year ticks as appropriate.
Here is an example, starting from a dummy test dataframe:
import matplotlib.pyplot as plt
import pandas as pd
import numpy as np
df = pd.DataFrame({'Date': [f'{y}-{m:02d}' for y in range(1959, 2019) for m in range(1, 13)]})
df['Values'] = np.random.randn(len(df)).cumsum()
df = df.set_index('Date')
result_pca_func = df
result_pca_func.index = pd.to_datetime(result_pca_func.index)
fig, ax2 = plt.subplots(figsize=(10, 3))
ax2.plot(result_pca_func)
plt.tight_layout()
plt.show()
Im new on PANDAS and MAtplotlib, still learning each day. Appreciate your help. I keep receiving for some plots the Y values at the wrong X position. Not sure if there is somethign related to the dataframe im producing, everything looks fine for me, but it keeps plotting at an offset of X+1. As im using DATES for X values, it keeps plotting the values one month ahead everytime.
The dataFrame dfExec1 comes from the main df:
dfRevenue = pd.read_csv('Revenue Report_DataHistory.csv')
dfExec1 = dfRevenue[dfRevenue['PLAN/EXEC'] == 'EXEC']
dfExec1.loc[:,'Year'] = pd.to_datetime(dfExec1['Year'], format='%m/%d/%Y', errors='coerce')
dfExec1 = dfExec1.groupby(pd.Grouper(key='Year', freq='M')).sum()
This is a picture of dfExec1 :
dfExec1 frame. All data are floats
Now i tried to choose to work only with the columns i wanted and zeros as NaN. I also created a new column for the DATES to try to see if the plot came out correct.
dfServicos = dfExec1.iloc[:, [0,1,2,3,4,6,7,8,9,10,11]]
dfServicos[dfServicos==0] = np.nan
dfServicos['DATAS'] = dfServicos.index
#dfServicos
fig6, ax = plt.subplots(figsize=(25,7))
#for coluna in dfServicos.columns:
#ax.scatter(x=dfServicos['DATAS'], y=dfServicos.loc[:, coluna], s=100, label=[coluna])
ax.scatter(x=dfServicos.iloc[0,11], y=dfServicos.iloc[0, 0], s=100, label=['Fishing'])
ax.legend()
plt.show()
This is Exec1 after treatment:
DataFrame - needed to cover blue data but Floats and NaN
I only plotted one column as example, but all the plots are showing like this :
X position offset by 01 month
Thank you very much for your support !
just got the problem solved by plotting xticks first. After that needed to configure date format.
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
import matplotlib.ticker as ticker
dfServicos = dfExec1.iloc[:, [0,1,2,3,4,6,7,8,9,10,11]]
dfServicos[dfServicos==0] = np.nan
#dfServicos
fig6, ax = plt.subplots(figsize=(35,7))
ax.set_xticks(dfServicos.index)
for coluna in dfServicos.columns:
ax.scatter(x=dfServicos.index, y=dfServicos.loc[:, coluna], s=100, label=[coluna])
#ax.bar(x=dfServicos.index, height=dfServicos.loc[:, 'Fishing'])
dateForm = DateFormatter('%m-%Y')
ax.xaxis.set_major_formatter(dateForm)
formatter = ticker.FormatStrFormatter('$%1.2f')
ax.yaxis.set_major_formatter(formatter)
ax.legend()
plt.show()
I have a dataframe of daily temperature variation with time
time temp temp_mean
00:01:51.57 185.94 185.94
00:01:52.54 187.48 186.71
00:01:53.51 197.85 190.4233333
00:01:54.49 195.71 191.745
00:01:55.46 197.22 192.84
00:01:56.43 187.33 191.9216667
00:01:57.41 194.18 192.2442857
00:01:58.38 199.9 193.20125
00:01:59.35 184.23 192.2044444
00:02:00.33 201.34 193.118
00:02:01.30 200.12 193.7545455
00:02:02.27 199.13 194.2025
00:02:03.24 187.47 193.6846154
00:02:04.22 187.65 193.2535714
00:02:05.19 195.59 193.4093333
00:02:06.17 188.7 193.115
00:02:07.14 196.16 193.2941176
00:02:08.11 191.17 193.1761111
00:02:09.08 198.62 193.4626316
00:02:10.06 190.79 193.329
00:02:11.03 193.35 193.33
00:02:12.00 199.36 193.6040909
00:02:12.98 190.76 193.4804348
00:02:13.95 205.16 193.9670833
00:02:14.92 194.89 194.004
00:02:15.90 185.3 193.6692308
like this. (12000+ rows)
I want to plot time vs temp as a line plot, with hourly ticks on x-axis(1 hr interval).
But somehow I couldn't assign x ticks with proper frequency.
fig, ax = plt.subplots()
ax.plot(data['time'], data['temp'])
ax.plot(data['time'], data['temp_mean'],color='red')
xformatter = mdates.DateFormatter('%H:%M')
xlocator = mdates.HourLocator(interval = 1)
## Set xtick labels to appear every 15 minutes
ax.xaxis.set_major_locator(xlocator)
## Format xtick labels as HH:MM
ax.xaxis.set_major_formatter(xformatter)
fig.autofmt_xdate()
ax.tick_params(axis='x', rotation=45)
plt.show()
Here xticks seems to be crowded and overlapping, but I need ticks from 0:00 to 23:00 with one hour interval.
What should I do ?
Convert the 'time' column to a datetime dtype with pd.to_datetime, and then extract the time component with the .dt accessor.
See python datetime format codes to specify the format=... string.
Plot with pandas.DataFrame.plot
Tested in python 3.8.12, pandas 1.3.3, matplotlib 3.4.3
import pandas as pd
# sample data
data = {'time': ['00:01:51.57', '00:01:52.54', '00:01:53.51', '00:01:54.49', '00:01:55.46', '00:01:56.43', '00:01:57.41', '00:01:58.38', '00:01:59.35', '00:02:00.33', '00:02:01.30', '00:02:02.27', '00:02:03.24', '00:02:04.22', '00:02:05.19', '00:02:06.17', '00:02:07.14', '00:02:08.11', '00:02:09.08', '00:02:10.06', '00:02:11.03', '00:02:12.00', '00:02:12.98', '00:02:13.95', '00:02:14.92', '00:02:15.90'],
'temp': [185.94, 187.48, 197.85, 195.71, 197.22, 187.33, 194.18, 199.9, 184.23, 201.34, 200.12, 199.13, 187.47, 187.65, 195.59, 188.7, 196.16, 191.17, 198.62, 190.79, 193.35, 199.36, 190.76, 205.16, 194.89, 185.3],
'temp_mean': [185.94, 186.71, 190.4233333, 191.745, 192.84, 191.9216667, 192.2442857, 193.20125, 192.2044444, 193.118, 193.7545455, 194.2025, 193.6846154, 193.2535714, 193.4093333, 193.115, 193.2941176, 193.1761111, 193.4626316, 193.329, 193.33, 193.6040909, 193.4804348, 193.9670833, 194.004, 193.6692308]}
df = pd.DataFrame(data)
# convert column to datetime and extract time component
df.time = pd.to_datetime(df.time, format='%H:%M:%S.%f').dt.time
# plot
ax = df.plot(x='time', color=['tab:blue', 'tab:red'])
Currently I am charting data from some historical point to a point in current time. For example, January 2019 to TODAY (February 2021). However, my matplotlib chart only shows dates from January 2019 to January 2021 on the x-axis (with the last February tick missing) even though the data is charted to today's date on the actual plot.
Is there any way to ensure that the first and last month is always reflected on the x-axis chart? In other words, I would like the x-axis to have the range displayed (inclusive).
Picture of x axis (missing February 2021)
The data charted here is from January 2019 to TODAY (February 12th).
Here is my code for the date format:
fig.autofmt_xdate()
date_format = mdates.DateFormatter("%b-%y")
ax.xaxis.set_major_formatter(date_format)
EDIT: The numbers after each month represent years.
I am not aware of any way to do this other than by creating the ticks from scratch.
In the following example, a list of all first-DatetimeIndex-timestamp-of-the-month is created from the DatetimeIndex of a pandas dataframe, starting from the month of the first date (25th of Jan.) up to the start of the last ongoing month. An appropriate number of ticks is automatically selected by the step variable and the last month is appended and then removed with np.unique when it is a duplicate. The labels are formatted from the tick timestamps.
This solution works for any frequency smaller than yearly:
import numpy as np # v 1.19.2
import pandas as pd # v 1.1.3
import matplotlib.pyplot as plt # v 3.3.2
# Create sample dataset
start_date = '2019-01-25'
end_date = '2021-02-12'
rng = np.random.default_rng(seed=123) # random number generator
dti = pd.date_range(start_date, end_date, freq='D')
variable = 100 + rng.normal(size=dti.size).cumsum()
df = pd.DataFrame(dict(variable=variable), index=dti)
# Create matplotlib plot
fig, ax = plt.subplots(figsize=(10, 2))
ax.plot(df.index, df.variable)
# Create list of monthly ticks made of timestamp objects
monthly_ticks = [timestamp for idx, timestamp in enumerate(df.index)
if (timestamp.month != df.index[idx-1].month) | (idx == 0)]
# Select appropriate number of ticks and include last month
step = 1
while len(monthly_ticks[::step]) > 10:
step += 1
ticks = np.unique(np.append(monthly_ticks[::step], monthly_ticks[-1]))
# Create tick labels from tick timestamps
labels = [timestamp.strftime('%b\n%Y') if timestamp.year != ticks[idx-1].year
else timestamp.strftime('%b') for idx, timestamp in enumerate(ticks)]
plt.xticks(ticks, labels, rotation=0, ha='center');
As you can see, the first and last months are located at an irregular distance from the neighboring tick.
In case you are plotting a time series with a discontinous date range (e.g. weekend and holidays not included) and you are not using the DatetimeIndex for the x-axis (like this for example: ax.plot(range(df.index.size), df.variable)) so as to avoid gaps with straight lines showing up on short time series and/or very wide plots, then replace the last line of code with this:
plt.xticks([df.index.get_loc(tick) for tick in ticks], labels, rotation=0, ha='center');
Matplotlib uses a limited number of ticks. It just happens that for February 2021 no tick is used. There are two things you could try. First try setting the axis limits to past today with:
ax.set_xlim(start_date, end_date)
What you could also try, is using even more ticks:
ax.set_xticks(np.arange(np.min(x), np.max(x), n_ticks))
Where n_ticks stands for the amount of ticks and x for the values on the x-axis.
I am trying to plot a lineplot in seaborn with values for the y axis and dates for the x axis coming from a pandas dataframe.
When I create the lineplot without converting the date column to datetime object it puts the dates in the wrong order. When I do convert the date column to datetime format it gives strange x labels and only shows 5 of the dates.
df = pd.read_csv("dddd.csv")
df["date"] = pd.to_datetime(df["date"])
df = df.sort_values(["date"])
ax = sns.lineplot(y=df["google"],x=df["date"],color="red",data=df)
plt.show()
I want to just plot the data with the x labels being the dates have it in order. Here is some example data.
25-03-2019 -100
26-03-2019 -66.66666667
27-03-2019 -80
28-03-2019 -87.08333333
29-03-2019 -88.88888889
30-03-2019 -86.28526646
31-03-2019 -87.5
01-04-2019 -87.87878788
02-04-2019 -82.92682927
03-04-2019 -84.09090909
04-04-2019 -84.7826087
05-04-2019 -85.71428571
06-04-2019 -81.30677848
07-04-2019 -81.98051948
08-04-2019 -82.14285714
09-04-2019 -78.46153846
10-04-2019 -76.05633803
11-04-2019 -75
12-04-2019 -75
13-04-2019 -80
14-04-2019 -83.33333333
15-04-2019 -83.33333333
16-04-2019 -77.77777778
17-04-2019 -68
18-04-2019 -54.70085471
19-04-2019 -64.70588235
20-04-2019 -66.66666667
Despite this question most certainly being a duplicate, here is some explanation:
Without converting to datetime, your x-axis is unordered, because python doesn't understand the values are dates. Here it probably draws a tick for every point, because it doesn't know how to fit "16-04-2019" on a continuous scale. (Try plotting an array of strings on the x-axis as comparison).
If you do convert to datetime, seaborn and pandas are intelligent enough, to notice your dates form a continuous scale. Because the ticklabels would overlap when drawing all of them, it leaves out some of them and draws ticks in regular intervals instead.
You can influence the location with ax.set_ticks(locations), their style with ax.tick_params(rotation=90, horizontal_alignment='left') or their labels with ax.set_xticklabels(labels). (keywords as example).
You can also use
import matplotlib.dates ad md
locs = md.YearLocator() # Locations for yearly ticks
format = md.MajorFormatter('%B %Y') # Format for labels
to access formatter functions and ax.set_major_locator / formatter to apply them to your plot.
To simply draw a tick for every datapoint, in your case use:
ax.set_xticks(df["Date"])