Change tick frequency for datetime axis [duplicate]

Change tick frequency for datetime axis [duplicate] - python

This question already has an answer here:
Change tick frequency on X (time, not number) frequency in matplotlib
(1 answer)
Closed 3 years ago.
I have the following dataframe:
Date Prod_01 Prod_02
19 2018-03-01 49870 0.0
20 2018-04-01 47397 0.0
21 2018-05-01 53752 0.0
22 2018-06-01 47111 0.0
23 2018-07-01 53581 0.0
24 2018-08-01 55692 0.0
25 2018-09-01 51886 0.0
26 2018-10-01 56963 0.0
27 2018-11-01 56732 0.0
28 2018-12-01 59196 0.0
29 2019-01-01 57221 5.0
30 2019-02-01 55495 472.0
31 2019-03-01 65394 753.0
32 2019-04-01 59030 1174.0
33 2019-05-01 64466 2793.0
34 2019-06-01 58471 4413.0
35 2019-07-01 64785 6110.0
36 2019-08-01 63774 8360.0
37 2019-09-01 64324 9558.0
38 2019-10-01 65733 11050.0
And I need to plot a time series of the 'Prod_01' column.
The 'Date' column is in the pandas datetime format.
So I used the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(rotation=45, horizontalalignment='right');
Output:
However, I want to change the frequency of the xticks to one month, so I get one tick and one label for each month.
I have tried the following command:
plt.figure(figsize=(10,4))
plt.plot('Date', 'Prod_01', data=test, linewidth=2, color='steelblue')
plt.xticks(np.arange(1, len(test), 1), test['Date'] ,rotation=45, horizontalalignment='right');
But I get this:
How can I solve this problem?
Thanks in advance.

I'm not very familiar with pandas data frames. However, I can't see why this wouldn't work with any pyplot:
According the top SO answer on related post by ImportanceOfBeingErnest:
The spacing between ticklabels is exclusively determined by the space between ticks on the axes.
So, to change the distance between ticks, and the labels you can do this:
Suppose a cluttered and base-10 centered person displays the following graph:
It takes the following code and importing matplotlib.ticker:
import numpy as np
import matplotlib.pyplot as plt
# Import this, too
import matplotlib.ticker as ticker
# Arbitrary graph with x-axis = [-32..32]
x = np.linspace(-32, 32, 1024)
y = np.sinc(x)
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(5))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(1))
# -----------------------------------------------------
# Plot and show graph
plt.plot(x, y)
plt.show()
To change where the labels are placed, you can change the distance between the 'major ticks'. You can also change the smaller 'minor ticks' in between, which don't have a number attached. E.g., on a clock, the hour ticks have numbers on them and are larger (major ticks) with smaller, unlabeled ones between marking the minutes (minor ticks).
By changing the --- Look Here --- part to:
# -------------------- Look Here --------------------
# Access plot's axes
axs = plt.axes()
# Set distance between major ticks (which always have labels)
axs.xaxis.set_major_locator(ticker.MultipleLocator(8))
# Sets distance between minor ticks (which don't have labels)
axs.xaxis.set_minor_locator(ticker.MultipleLocator(4))
# -----------------------------------------------------
You can generate the cleaner and more elegant graph below:
Hope that helps!

Related

how to visualize columns of a dataframe python as a plot?

I have a dataframe that looks like below:
DateTime ID Temperature
2019-03-01 18:36:01 3 21
2019-04-01 18:36:01 3 21
2019-18-01 08:30:01 2 18
2019-12-01 18:36:01 2 12
I would like to visualize this as a plot, where I need the datetime in x-axis, and Temperature on the y axis with a hue of IDs, I tried the below, but i need to see the Temperature distribution for every point more clearly. Is there any other visualization technique?
x= df['DateTime'].values
y= df['Temperature'].values
hue=df['ID'].values
plt.scatter(x, y,hue,color = "red")

you can try:
df.set_index('DateTime').plot()
output:
or you can use:
df.set_index('DateTime').plot(style="x-", figsize=(15, 10))
output:

Line Plot in MathPlotLib, by frequency of date

So I have a dataframe in pandas like below:
date max min rain snow ice
0 2019-01-01 58 39 0.06 0.0 0.0
1 2019-01-01 58 39 0.06 0.0 0.0
2 2019-01-01 58 39 0.06 0.0 0.0
3 2019-01-01 58 39 0.06 0.0 0.0
4 2019-01-01 58 39 0.06 0.0 0.0
The goal is to create a line plot which shows, on the x axis, the max temperature, and on the y axis, the frequency of each date for that temperature.
So basically, the list of dates are shop transactions and I want to see the effect the temperature has on the number of transactions per day.
I've tried to use this which groups the weather_frame by date, but I can't get my plot to show the temperature on the x axis.
max_temp = weather_frame.groupby(weather_frame.date).size()
I've attached the file below. I had to delete some of it to stay within the size limits for paste bin so, the graph may appear corrupted. Data Link

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
date_freq = weather_frame.groupby(weather_frame.date).size()
max_temp = weather_frame[['date', 'max']].groupby(weather_frame.date).mean()
sns.set()
plt.figure()
sns.regplot(x=max_temp, y=date_freq)
plt.xlabel('Maximum Temperature')
plt.ylabel('Number of Transactions per Day')
It looks like there is a slight positive relationship between max temperature and number of transactions per day.

pandas display categories incorrect displayed in matplotlib

I am trying to represent categories in matplotlib and for some reason I have categories overlapping on x-axis, as well as missing categories, but y-axis values present. I marked this with red arrows in the picture from the bottom of the question.
The data is contained in sales.csv file that looks like this:
date,first name,last name,city,cost,rooms,bathrooms,type,status
2018-03-04 12:13:21,Linda,Evangelista,Balm Beach,333000,2,2,townhouse,sold
2018-02-01 07:20:20,Rita,Ford,Balm Beach,818000,2,2,detached,sold
2018-03-08 07:13:00,Ali,Hassan,Bowmanville,413000,2,2,bungalow,forsale
2018-05-08 21:00:00,Rashid,Forani,Bowmanville,467000,2,2,townhouse,sold
2018-02-07 16:43:00,Kumar,Yoshi,Bowmanville,613000,3,3,bungalow,sold
2018-01-05 13:43:00,Srini,Santinaram,Bowmanville,723000,2,2,bungalow,forsale
2018-01-03 14:19:00,Maria,Dugall,Brampton,900000,4,3,semidetached,forsale
2018-05-04 19:22:00,Zina,Evangel,Burlington,221000,1,1,townhouse,forsale
2018-05-01 19:44:00,Pierre,Merci,Gatineau,3199000,14,14,bungalow,forsale
2018-05-31 18:10:00,Istvan,Kerekes,Kingston,1110000,4,5,bungalow,sold
2018-03-25 08:22:00,Dumitru,Plamada,Kingston,1650000,5,5,bungalow,forsale
2018-01-01 11:54:00,John,Smith,Markham,1200000,3,3,bungalow,sold
2018-05-07 15:30:00,Arturo,Gonzales,Mississauga,187000,3,3,bungalow,forsale
2018-03-07 22:20:00,Lei,Zhang,North York,122000,1,1,townhouse,forsale
2018-05-04 20:04:00,William,King,Oaks,,3,3,bungalow,sold
2018-03-04 13:05:00,Jeffrey,Kong,Oakville,,2,2,townhouse,forsale
2018-01-04 17:23:00,Abdul,Karrem,Orillia,883000,3,4,townhouse,sold
2018-03-01 13:09:00,Jean,Paumier,Ottawa,1520000,4,4,townhouse,sold
2018-02-01 10:00:00,Ken,Beaufort,Ottawa,3440000,5,5,bungalow,forsale
2018-02-15 11:33:00,Gheorghe,Ionescu,Richmond Hill,1630000,4,3,bungalow,forsale
2018-01-05 10:32:00,Ion,Popescu,Scarborough,1420000,5,3,semidetached,sold
2018-02-07 11:44:00,Xu,Yang,Toronto,422000,2,2,townhouse,forsale
2018-05-29 00:33:00,Giovanni,Gianparello,Toronto,1917000,4,4,bungalow,forsale
2018-03-25 08:27:00,John,Saint-Claire,Toronto,3337000,5,4,bungalow,forsale
2018-01-06 14:06:00,Ann,Murdoch Pyrell,Toronto,1427000,5,4,bungalow,forsale
2018-02-15 13:12:00,Claire,Coldwell,Toronto,3777000,5,4,bungalow,forsale
2018-01-02 09:37:00,Kyle,MCDonald,Toronto,,2,2,townhouse,forsale
2018-02-01 21:22:00,Miriam,Berg,Toronto,,4,4,townhouse,forsale
The code to load the data and display the graph is below:
import pandas as pd
import matplotlib.pyplot as plt
# Load data
sales_brute = pd.read_csv('sales.csv', parse_dates=True, index_col='date')
# Fix the columns names by stripping the extra spaces
sales_brute = sales_brute.rename(columns=lambda x: x.strip())
# Fix the N/A from cost column
sales_brute['cost'].fillna(sales_brute['cost'].mean(), inplace=True)
# Draws a scattered plot, price by cities. Change the colors of plot.
plt.scatter(sales_brute['city'], sales_brute['cost'], color='red')
# Rotates the ticks with 70 grd
plt.xticks(sales_brute['city'], rotation=70)
plt.tight_layout()
# Add grid
plt.grid()
plt.show()
and the results looks strangely like this:
Incorrect display of categories

Maybe we have different versions of matplotlib, but I can't use plt.scatter at all with sales_brute['city'] as first argument.
ValueError: could not convert string to float: 'Toronto'
Instead I made up a new x-axis:
x = range(len(sales_brute))
plt.scatter(x=x, y=sales_brute['cost'], color='red')
plt.xticks(x, sales_brute['city'], rotation=70)
plt.show()
Which results in:
(some stretching required to see the full names)

plt.scatter seems to be happy to take strings as the x-coordinate and arrange them in alphabetical order. plt.xticks, however, wants a list matching the number of ticks and in the same order.
If you change:
plt.xticks(sales_brute['city'], rotation=70)
to
plt.xticks(sales_brute['city'].sort_values().unique(), rotation=70),
you'll get the effect you want.

Pandas: Histogram Plotting

I have a dataframe with dates (datetime) in python. How can I plot a histogram with 30 min bins from the occurrences using this dataframe?
starttime
1 2016-09-11 00:24:24
2 2016-08-28 00:24:24
3 2016-07-31 05:48:31
4 2016-09-11 00:23:14
5 2016-08-21 00:55:23
6 2016-08-21 01:17:31
.............
989872 2016-10-29 17:31:33
989877 2016-10-02 10:00:35
989878 2016-10-29 16:42:41
989888 2016-10-09 07:43:27
989889 2016-10-09 07:42:59
989890 2016-11-05 14:30:59
I have tried looking at examples from Plotting series histogram in Pandas and A per-hour histogram of datetime using Pandas. But they seem to be using a bar plot which is not what I need. I have attempted to create the histogram using temp.groupby([temp["starttime"].dt.hour, temp["starttime"].dt.minute]).count().plot(kind="hist") giving me the results as shown below
If possible I would like the X axis to display the time(e.g 07:30:00)

I think you need bar plot and for axis with times simpliest is convert datetimes to strings by strftime:
temp = temp.resample('30T', on='starttime').count()
ax = temp.groupby(temp.index.strftime('%H:%M')).sum().plot(kind="bar")
#for nicer bar some ticklabels are hidden
spacing = 2
visible = ax.xaxis.get_ticklabels()[::spacing]
for label in ax.xaxis.get_ticklabels():
if label not in visible:
label.set_visible(False)

Plot datetime.date / time series in a pandas dataframe

I created a pandas dataframe from some value counts on particular calendar dates. Here is how I did it:
time_series = pd.DataFrame(df['Operation Date'].value_counts().reset_index())
time_series.columns = ['date', 'count']
Basically, it is two columns, the first "date" is a column with datetime.date objects and the second column, "count" are simply integer values. Now, I'd like to plot a scatter or a KDE to represent how the value changes over the calendar days.
But when I try:
time_series.plot(kind='kde')
plt.show()
I get a plot where the x-axis is from -50 to 150 as if it is parsing the datetime.date objects as integers somehow. Also, it is yielding two identical plots rather than just one.
Any idea how I can plot them and see the calendars day along the x-axis?

you sure you got datetime? i just tried this and it worked fine:
df = date count
7 2012-06-11 16:51:32 1.0
3 2012-09-28 08:05:14 12.0
19 2012-10-01 18:01:47 4.0
2 2012-10-03 15:18:23 29.0
6 2012-12-22 19:50:43 4.0
1 2013-02-19 19:54:03 28.0
9 2013-02-28 16:08:40 17.0
12 2013-03-12 08:42:55 6.0
4 2013-04-04 05:27:27 6.0
17 2013-04-18 09:40:37 29.0
11 2013-05-17 16:34:51 22.0
5 2013-07-07 14:32:59 16.0
14 2013-10-22 06:56:29 13.0
13 2014-01-16 23:08:46 20.0
15 2014-02-25 00:49:26 10.0
18 2014-03-19 15:58:38 25.0
0 2014-03-31 05:53:28 16.0
16 2014-04-01 09:59:32 27.0
8 2014-04-27 12:07:41 17.0
10 2014-09-20 04:42:39 21.0
df = df.sort_values('date', ascending=True)
plt.plot(df['date'], df['count'])
plt.xticks(rotation='vertical')
EDIT:
if you want a scatter plot you can:
plt.plot(df['date'], df['count'], '*')
plt.xticks(rotation='vertical')

If the column is datetime dtype (not object), then you can call plot() directly on the dataframe. You don't need to sort by date either, it's done behind the scenes if x-axis is datetime.
df['date'] = pd.to_datetime(df['date'])
df.plot(x='date', y='count', kind='scatter', rot='vertical');
You can also pass many arguments to make the plot nicer (add titles, change figsize and fontsize, rotate ticklabels, set subplots axis etc.) See the docs for full list of possible arguments.
df.plot(x='date', y='count', kind='line', rot=45, legend=None,
title='Count across time', xlabel='', fontsize=10, figsize=(12,4));
You can even use another column to color scatter plots. In the example below, the months are used to assign color. Tip: To get the full list of possible colormaps, pass any gibberish string to colormap and the error message will show you the full list.
df.plot(x='date', y='count', kind='scatter', rot=90, c=df['date'].dt.month, colormap='tab20', sharex=False);

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Change tick frequency for datetime axis [duplicate] - python

Related

how to visualize columns of a dataframe python as a plot?

Line Plot in MathPlotLib, by frequency of date

pandas display categories incorrect displayed in matplotlib

Pandas: Histogram Plotting

Plot datetime.date / time series in a pandas dataframe

Categories

Resources