Modifying number of ticks on Pandas hourly time axis - python

If I have the following example Python code using a Pandas dataframe:
import pandas as pd
from datetime import datetime
ts = pd.DataFrame(randn(1000), index=pd.date_range('1/1/2000 00:00:00', freq='H', periods=1000), columns=['Data'])
ts['Time'] = ts.index.map(lambda t: t.time())
ts = ts.groupby('Time').mean()
ts.plot(x_compat=True, figsize=(20,10))
The output plot is:
What is the most elegant way to get the X-Axis ticks to automatically space themselves hourly or bi-hourly? x_compat=True has no impact

You can pass to ts.plot() the argument xticks. Giving the right interval you can plot hourly our bi-hourly like:
max_sec = 90000
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 3600))
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 7200))
Here max_sec is the maximum value of the xaxis, in seconds.

Related

Plot each single day on one plot by extracting time of DatetimeIndex without for loop

I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()

How to plot a very large data set (date,time (x axis) vs values recorded(y-axis) every 15 mins daily for a a year) in Matplotlib

I am supposed to prepare an x vs y graph in python. My data set consists of Date - Time and Temperature which is recorded in an interval of 15 mins year long. Let say I have data of one month and I tried to plot it in Matplotlib. I am getting a graph which is not that clear because the x-axis (data-time) is filled throughout the axis and I am not getting a clear picture whereas Excel gives a good plot comparing to matplotlib.
The code I use to open 30 individual daily csv data recorded files and concatenating it to form one data frame is as follows
import pandas as pd
from openpyxl import load_workbook
import tkinter as tk
import datetime
from datetime import datetime
from datetime import time
from tkinter import filedialog
import matplotlib.pyplot as plt
root = tk.Tk()
root.withdraw()
root.call('wm', 'attributes', '.', '-topmost', True)
files = filedialog.askopenfilename(multiple=True)
%gui tk
var = root.tk.splitlist(files)
filePaths = []
for f in var:
df = pd.read_csv(f,skiprows=8, index_col=None, header=0, parse_dates=True, squeeze=True, encoding='ISO-8859–1', names=['Date', 'Time', 'Temperature', 'Humidty']) #,
filePaths.append(df)
df = pd.concat(filePaths, axis=0, join='outer', ignore_index=False, sort=True, verify_integrity=False, levels=None)
df["Time period"] = df["Date"] + df["Time"]
plt.figure()
plt.subplots(figsize=(25,20))
plt.plot('Time period', 'Temperature', data=df, linewidth=2, color='g')
plt.title('Temperature distribution Graph')
plt.xlabel('Time')
plt.grid(True)
Example of data
The output graph looks like this:
As you can see in the output graph is flourished with the data points on the x axis and it is not in a readable form. Also, matplotlib give multiple graphs if I load and concatenate .csv files for a group of days.
The same data set plotted in Excel/Libre gives a smooth graph with oderly arranged dates on the x axis and the line graph is also perfect.
I want to rewrite my code to plot a graph similar to one plotted in Excel/Libre. Please help
Try this approach:
Use date locators to format the x-axis with the date range you require.
Date locators can be used to define intervals in seconds, minutes, ...:
SecondLocator: Locate seconds
MinuteLocator: Locate minutes
HourLocator: Locate hours
DayLocator: Locate specified days of the month
MonthLocator: Locate months
YearLocator: Locate years
In the example, I use the MinuteLocator, with 15 minutes interval.
Import matplotlib.dates to work dates in plots:
import matplotlib.dates as mdates
import pandas as pd
import matplotlib.pyplot as plt
Get your data
# Sample data
# Data
df = pd.DataFrame({
'Date': ['07/14/2020', '07/14/2020', '07/14/2020', '07/14/2020'],
'Time': ['12:15:00 AM', '12:30:00 AM', '12:45:00 AM', '01:00:00 AM'],
'Temperature': [22.5, 22.5, 22.5, 23.0]
})
Convert Time period from String to Date object:
# Convert data to Date and Time
df["Time period"] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
Define min and max interval:
min = min(df['Time period'])
max = max(df['Time period'])
Create your plot:
# Plot
# Create figure and plot space
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot()
Set time interval using locators:
# Set Time Interval
ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=15))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M'))
Set your plot options and plot:
# Set labels
ax.set(xlabel="Time",
ylabel="Temperature",
title="Temperature distribution Graph", xlim=[min , max])
# Plot chart
ax.plot('Time period', 'Temperature', data=df, linewidth=2, color='g')
ax.grid(True)
fig.autofmt_xdate()
plt.show()

Dataframe changing question with time series data with pandas

I have this dataframe:
The event-time is certain time, date-time column is every 10 min with a specific price. Continues for 4 hours after event time and 2 hours before the event for each security. I have thousands of securities. I want to create a plot that i x-axis starts from -12 to 24 which is event time to -2 hour to 4 hours after. y-axis price change. Is any way to synchronize date-time in python for security.
If you're looking to simply plot the data pandas should handle your datetimes for you assuming they are in datetime formats instead of strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
event_time = pd.to_datetime('2024-04-28T07:52:00')
date_time = pd.date_range(event_time, periods=24, freq=pd.to_timedelta(10, 'minute'))
df = pd.DataFrame({'date_time': date_time, 'change': np.random.normal(size=len(date_time))})
ax = df.plot(x='date_time', y='change')
plt.show()
However if you're wanting to remove the specific times from the x axis and just count up from zero you could use the index as the x-axis:
df['_index'] = df.index
ax = df.plot(x='_index', y='change')
plt.show()

How to Count and Plot Interval Time Series Data (Hourly) in Python)?

I have accident Time Series data (YYYY-MM-DD HH:MM:SS. I can count all the data based on weekday, year, hour, but I am trying to to count between 8 and 17 hours. Also, want to plot and show the counted numbers range as 8-10, 10-12, 12-13, 13-15, 15-17.
My codes;
df = pd.read_excel("C:/Users/gokhankazar/Desktop/Accident Times/Accident_Time-Series.xlsx")
df["Hour"] = df.Datetime.dt.hour
df.Hour.value_counts().sort_index().plot(marker='o', linestyle='-')
plt.xlabel('Hours', fontsize=10)
plt.ylabel("Number of Accident", fontsize= 10)
plt.show()
And I got plot figure as below
My figure that I got
But how can I change my axis range in plot figure.
Also I have weekdays figure as like
WeekDay figure
I want to write on x axis Monday instead of "0" and Sunday instead of "6" with all other days
my Datetime column as (total 268000 rows) and just counting the accident event based on time series data
18.05.2015 09:00:00
18.05.2015 15:00:00
18.05.2015 14:14:00
18.05.2015 09:00:00
.
.
.
You need to group you data by every 2h and set proper ticks for you plot.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'date': ['18.05.2015 09:00:00','18.05.2015 15:00:00', '18.05.2015 14:14:00', '18.05.2015 11:00:00']})
# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# group time every 2h
df = df.groupby(df.date.dt.floor('2H')).count()
# plot data
fig, ax1 = plt.subplots()
ax1.set_xticks(np.arange(8, 15, 2))
ax1.set_xticklabels(['8-10', '10-12', '12-14', '14-16'])
plt.plot(df.index.hour, df.date, '.')
plt.show()
OUtput:

Pandas plot ONLY overlap between multiple data frames

Found on S.O. the following solution to plot multiple data frames:
ax = df1.plot()
df2.plot(ax=ax)
But what if I only want to plot where they overlap?
Say that df1 index are timestamps that spans 24 hour and df2 index also are timestamps that spans 12 hours within the 24 hours of df1 (but not exactly the same as df1).
If I only want to plot the 12 hours that both data frames covers. What's the easies way to do this?
A general answer to a general question:
You have three options:
Filter both DataFrames prior to plotting, such that they contain the same time interval.
Use the xlim keyword from the pandas plotting function.
Plot both dataframes and set the axes limits later on (ax.set_xlim())
There are multiple ways to achieve this. The code snippet below shows two such ways as an example.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Make up time data for 24 hour period and 12 hour period
times1 = pd.date_range('1/30/2016 12:00:00', '1/31/2016 12:00:00', freq='H')
times2 = pd.date_range('1/30/2016 12:00:00', '1/31/2016 00:00:00', freq='H')
# Put time into DataFrame
df1 = pd.DataFrame(np.cumsum(np.random.randn(times1.size)), columns=['24 hrs'],
index=times1)
df2 = pd.DataFrame(np.cumsum(np.random.randn(times2.size)), columns=['12 hrs'],
index=times2)
# Method 1: Filter first dataframe according to second dataframe's time index
fig1, ax1 = plt.subplots()
df1.loc[times2].plot(ax=ax1)
df2.plot(ax=ax1)
# Method 2: Set the x limits of the axis
fig2, ax2 = plt.subplots()
df1.plot(ax=ax2)
df2.plot(ax=ax2)
ax2.set_xlim(times2.min(), times2.max())
To plot only the portion of df1 whose index lies within the index range of df2, you could do something like this:
ax = df1.loc[df2.index.min():df2.index.max()].plot()
There may be other ways to do it, but that's the one that occurs to me first.
Good luck!

Categories