Dataframe changing question with time series data with pandas - python

I have this dataframe:
The event-time is certain time, date-time column is every 10 min with a specific price. Continues for 4 hours after event time and 2 hours before the event for each security. I have thousands of securities. I want to create a plot that i x-axis starts from -12 to 24 which is event time to -2 hour to 4 hours after. y-axis price change. Is any way to synchronize date-time in python for security.

If you're looking to simply plot the data pandas should handle your datetimes for you assuming they are in datetime formats instead of strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
event_time = pd.to_datetime('2024-04-28T07:52:00')
date_time = pd.date_range(event_time, periods=24, freq=pd.to_timedelta(10, 'minute'))
df = pd.DataFrame({'date_time': date_time, 'change': np.random.normal(size=len(date_time))})
ax = df.plot(x='date_time', y='change')
plt.show()
However if you're wanting to remove the specific times from the x axis and just count up from zero you could use the index as the x-axis:
df['_index'] = df.index
ax = df.plot(x='_index', y='change')
plt.show()

Related

Plot each single day on one plot by extracting time of DatetimeIndex without for loop

I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()

Matplotlib not reading time axis correctly

I want to visualise the daily data using Matplotlib. The data is temperature against time and has this format:
Time Temperature
1 8:23:04 18.5
2 8:23:04 19.0
3 9:12:57 19.0
4 9:12:57 20.0
... ... ...
But when plotting the graph, the Time values on x-axis is distorted, which looks like this:
Realising Matplotlib may not be interpreting time data correctly, I converted the time format using pd.to_datetime:
df['Time'] = pd.to_datetime(df['Time'], format="%H:%M:%S")
df.plot( 'Time', 'Temperature',figsize=(20, 10))
df.describe()
but this again returned:
How to make the time on x-axis look normal? Thanks
As #Michael O. was saying, you need to take care of the datetime.
You miss the day, year and month. Here I implemented a possible solution adding these missing data with some default values, you may want to change them.
The code is very simple and the comments illustrate what I am doing.
import pandas as pd
from datetime import datetime, date, time, timezone
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
vals=[["8:23:04", 18.5],
["8:23:04", 19.0],
["9:12:57", 19.0],
["9:12:57", 20.0]]
apd=pd.DataFrame(vals, columns=["Time", "Temp"])
# a simple function to convert a string to datetime
def conv_time(cell):
dt = datetime.strptime(cell, "%d/%m/%Y %H:%M:%S")
return(dt)
# the dataframe misses the day, month and year, we need to add some
apd["Time"]=["{}/{}/{} {}".format(1,1,2020, cell) for cell in apd["Time"]]
# we use the function to convert the column to a datetime
apd["Time"]=[conv_time(cell) for cell in apd["Time"]]
## plotting the results taking care of the axis
fig, ax = plt.subplots()
ax.xaxis.set_major_formatter(mdates.DateFormatter("%H"))
ax.set_xlim([pd.to_datetime('2020-01-1 6:00:00'), pd.to_datetime('2020-01-1 12:00:00')])
ax.scatter(apd["Time"], apd["Temp"])

How to Count and Plot Interval Time Series Data (Hourly) in Python)?

I have accident Time Series data (YYYY-MM-DD HH:MM:SS. I can count all the data based on weekday, year, hour, but I am trying to to count between 8 and 17 hours. Also, want to plot and show the counted numbers range as 8-10, 10-12, 12-13, 13-15, 15-17.
My codes;
df = pd.read_excel("C:/Users/gokhankazar/Desktop/Accident Times/Accident_Time-Series.xlsx")
df["Hour"] = df.Datetime.dt.hour
df.Hour.value_counts().sort_index().plot(marker='o', linestyle='-')
plt.xlabel('Hours', fontsize=10)
plt.ylabel("Number of Accident", fontsize= 10)
plt.show()
And I got plot figure as below
My figure that I got
But how can I change my axis range in plot figure.
Also I have weekdays figure as like
WeekDay figure
I want to write on x axis Monday instead of "0" and Sunday instead of "6" with all other days
my Datetime column as (total 268000 rows) and just counting the accident event based on time series data
18.05.2015 09:00:00
18.05.2015 15:00:00
18.05.2015 14:14:00
18.05.2015 09:00:00
.
.
.
You need to group you data by every 2h and set proper ticks for you plot.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'date': ['18.05.2015 09:00:00','18.05.2015 15:00:00', '18.05.2015 14:14:00', '18.05.2015 11:00:00']})
# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# group time every 2h
df = df.groupby(df.date.dt.floor('2H')).count()
# plot data
fig, ax1 = plt.subplots()
ax1.set_xticks(np.arange(8, 15, 2))
ax1.set_xticklabels(['8-10', '10-12', '12-14', '14-16'])
plt.plot(df.index.hour, df.date, '.')
plt.show()
OUtput:

Pandas plot ONLY overlap between multiple data frames

Found on S.O. the following solution to plot multiple data frames:
ax = df1.plot()
df2.plot(ax=ax)
But what if I only want to plot where they overlap?
Say that df1 index are timestamps that spans 24 hour and df2 index also are timestamps that spans 12 hours within the 24 hours of df1 (but not exactly the same as df1).
If I only want to plot the 12 hours that both data frames covers. What's the easies way to do this?
A general answer to a general question:
You have three options:
Filter both DataFrames prior to plotting, such that they contain the same time interval.
Use the xlim keyword from the pandas plotting function.
Plot both dataframes and set the axes limits later on (ax.set_xlim())
There are multiple ways to achieve this. The code snippet below shows two such ways as an example.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# Make up time data for 24 hour period and 12 hour period
times1 = pd.date_range('1/30/2016 12:00:00', '1/31/2016 12:00:00', freq='H')
times2 = pd.date_range('1/30/2016 12:00:00', '1/31/2016 00:00:00', freq='H')
# Put time into DataFrame
df1 = pd.DataFrame(np.cumsum(np.random.randn(times1.size)), columns=['24 hrs'],
index=times1)
df2 = pd.DataFrame(np.cumsum(np.random.randn(times2.size)), columns=['12 hrs'],
index=times2)
# Method 1: Filter first dataframe according to second dataframe's time index
fig1, ax1 = plt.subplots()
df1.loc[times2].plot(ax=ax1)
df2.plot(ax=ax1)
# Method 2: Set the x limits of the axis
fig2, ax2 = plt.subplots()
df1.plot(ax=ax2)
df2.plot(ax=ax2)
ax2.set_xlim(times2.min(), times2.max())
To plot only the portion of df1 whose index lies within the index range of df2, you could do something like this:
ax = df1.loc[df2.index.min():df2.index.max()].plot()
There may be other ways to do it, but that's the one that occurs to me first.
Good luck!

How do create a step chart in Pandas with time series data from two seperate data sources

I have two time-series datasets that I want to make a step-chart of.
The time series data is between Monday 2015-04-20 and Friday 2015-04-24.
The first dataset contains 26337 rows with values ranging from 0-1.
The second dataset contains 80 rows with values between 0-4.
First dataset represents motion sensor values in a room, with around 2-3 minutes between each measurement. 1 indicates the room is occupied, 0 indicates that it is empty. The second contains data from a survey where users could fill in how many people were in the same room, at the time they were answering the survey.
Now I want to compare this data, to find out how well the sensor performs. Obviously there is a lot of data that is "missing" in the second set. Is there a way to fill in the "blanks" in a step chart?
Each row has the following format:
Header
Timestamp (%Y-%m-%d %H:%M:%S),value
Example:
Time,Occupancy
24-04-2015 21:40:33,1
24-04-2015 21:43:11,0
.....
So far I have managed to import the first dataset and make a plot of it. Unfortunately the x-axis is not showing dates, but a lot of numbers:
import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime
data = open('PIRDATA.csv')
ts = pd.Series.from_csv(data, sep=',')
plot(ts);
Result:
How would I go on from here on now?
Try to use Pandas to read the data, using the Date column as the index (parsing the values to dates).
data = pd.read_csv('PIRDATA.csv', index_col=0, parse_dates=0)
To achieve your step chart objective, try:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
from matplotlib.dates import DateFormatter
from matplotlib.dates import HourLocator
small_dataset = pd.read_csv('SURVEY_RESULTS_WEEK1.csv', header=0,index_col=0, parse_dates=0)
big_dataset = pd.read_csv('PIRDATA_RAW_CONVERTED_DATETIME.csv', header=0,index_col=0, parse_dates=0)
small_dataset.rename(columns={'Occupancy': 'Survey'}, inplace=True)
big_dataset.rename(columns={'Occupancy': 'PIR'}, inplace=True)
big = big_dataset.plot()
big.xaxis.set_major_formatter(DateFormatter('%y-%m-%d H: %H'))
big.xaxis.set_major_locator(HourLocator(np.arange(0, 25, 6)))
big.set_ylabel('Occupancy')
small_dataset.plot(ax=big, drawstyle='steps')
fig = plt.gcf()
fig.suptitle('PIR and Survey Occupancy Comparsion')
plt.show()

Categories