I have accident Time Series data (YYYY-MM-DD HH:MM:SS. I can count all the data based on weekday, year, hour, but I am trying to to count between 8 and 17 hours. Also, want to plot and show the counted numbers range as 8-10, 10-12, 12-13, 13-15, 15-17.
My codes;
df = pd.read_excel("C:/Users/gokhankazar/Desktop/Accident Times/Accident_Time-Series.xlsx")
df["Hour"] = df.Datetime.dt.hour
df.Hour.value_counts().sort_index().plot(marker='o', linestyle='-')
plt.xlabel('Hours', fontsize=10)
plt.ylabel("Number of Accident", fontsize= 10)
plt.show()
And I got plot figure as below
My figure that I got
But how can I change my axis range in plot figure.
Also I have weekdays figure as like
WeekDay figure
I want to write on x axis Monday instead of "0" and Sunday instead of "6" with all other days
my Datetime column as (total 268000 rows) and just counting the accident event based on time series data
18.05.2015 09:00:00
18.05.2015 15:00:00
18.05.2015 14:14:00
18.05.2015 09:00:00
.
.
.
You need to group you data by every 2h and set proper ticks for you plot.
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
df = pd.DataFrame({'date': ['18.05.2015 09:00:00','18.05.2015 15:00:00', '18.05.2015 14:14:00', '18.05.2015 11:00:00']})
# convert to datetime
df['date'] = pd.to_datetime(df['date'])
# group time every 2h
df = df.groupby(df.date.dt.floor('2H')).count()
# plot data
fig, ax1 = plt.subplots()
ax1.set_xticks(np.arange(8, 15, 2))
ax1.set_xticklabels(['8-10', '10-12', '12-14', '14-16'])
plt.plot(df.index.hour, df.date, '.')
plt.show()
OUtput:
Related
I have a dataframe including random data over 7 days and each data point is indexed by DatetimeIndex. I want to plot data of each day on a single plot. Currently my try is the following:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
n =10000
i = pd.date_range('2018-04-09', periods=n, freq='1min')
ts = pd.DataFrame({'A': [np.random.randn() for i in range(n)]}, index=i)
dates = list(ts.index.map(lambda t: t.date).unique())
for date in dates:
ts['A'].loc[date.strftime('%Y-%m-%d')].plot()
The result is the following:
As you can see when DatetimeIndex is used the corresponding day is kept that is why we have each day back to the next one.
Questions:
1- How can I fix the current code to have an x-axis which starts from midnight and ends next midnight.
2- Is there a pandas way to group days better and plot them on a single day without using for loop?
You can split the index into dates and times and unstack the ts into a dataframe:
df = ts.set_index([ts.index.date, ts.index.time]).unstack(level=0)
df.columns = df.columns.get_level_values(1)
then plot all in one chart:
df.plot()
or in separate charts:
axs = df.plot(subplots=True, title=df.columns.tolist(), legend=False, figsize=(6,8))
axs[0].figure.execute_constrained_layout()
I am supposed to prepare an x vs y graph in python. My data set consists of Date - Time and Temperature which is recorded in an interval of 15 mins year long. Let say I have data of one month and I tried to plot it in Matplotlib. I am getting a graph which is not that clear because the x-axis (data-time) is filled throughout the axis and I am not getting a clear picture whereas Excel gives a good plot comparing to matplotlib.
The code I use to open 30 individual daily csv data recorded files and concatenating it to form one data frame is as follows
import pandas as pd
from openpyxl import load_workbook
import tkinter as tk
import datetime
from datetime import datetime
from datetime import time
from tkinter import filedialog
import matplotlib.pyplot as plt
root = tk.Tk()
root.withdraw()
root.call('wm', 'attributes', '.', '-topmost', True)
files = filedialog.askopenfilename(multiple=True)
%gui tk
var = root.tk.splitlist(files)
filePaths = []
for f in var:
df = pd.read_csv(f,skiprows=8, index_col=None, header=0, parse_dates=True, squeeze=True, encoding='ISO-8859–1', names=['Date', 'Time', 'Temperature', 'Humidty']) #,
filePaths.append(df)
df = pd.concat(filePaths, axis=0, join='outer', ignore_index=False, sort=True, verify_integrity=False, levels=None)
df["Time period"] = df["Date"] + df["Time"]
plt.figure()
plt.subplots(figsize=(25,20))
plt.plot('Time period', 'Temperature', data=df, linewidth=2, color='g')
plt.title('Temperature distribution Graph')
plt.xlabel('Time')
plt.grid(True)
Example of data
The output graph looks like this:
As you can see in the output graph is flourished with the data points on the x axis and it is not in a readable form. Also, matplotlib give multiple graphs if I load and concatenate .csv files for a group of days.
The same data set plotted in Excel/Libre gives a smooth graph with oderly arranged dates on the x axis and the line graph is also perfect.
I want to rewrite my code to plot a graph similar to one plotted in Excel/Libre. Please help
Try this approach:
Use date locators to format the x-axis with the date range you require.
Date locators can be used to define intervals in seconds, minutes, ...:
SecondLocator: Locate seconds
MinuteLocator: Locate minutes
HourLocator: Locate hours
DayLocator: Locate specified days of the month
MonthLocator: Locate months
YearLocator: Locate years
In the example, I use the MinuteLocator, with 15 minutes interval.
Import matplotlib.dates to work dates in plots:
import matplotlib.dates as mdates
import pandas as pd
import matplotlib.pyplot as plt
Get your data
# Sample data
# Data
df = pd.DataFrame({
'Date': ['07/14/2020', '07/14/2020', '07/14/2020', '07/14/2020'],
'Time': ['12:15:00 AM', '12:30:00 AM', '12:45:00 AM', '01:00:00 AM'],
'Temperature': [22.5, 22.5, 22.5, 23.0]
})
Convert Time period from String to Date object:
# Convert data to Date and Time
df["Time period"] = pd.to_datetime(df['Date'] + ' ' + df['Time'])
Define min and max interval:
min = min(df['Time period'])
max = max(df['Time period'])
Create your plot:
# Plot
# Create figure and plot space
fig = plt.figure(figsize=(10, 10))
ax = fig.add_subplot()
Set time interval using locators:
# Set Time Interval
ax.xaxis.set_major_locator(mdates.MinuteLocator(interval=15))
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m-%d %H:%M'))
Set your plot options and plot:
# Set labels
ax.set(xlabel="Time",
ylabel="Temperature",
title="Temperature distribution Graph", xlim=[min , max])
# Plot chart
ax.plot('Time period', 'Temperature', data=df, linewidth=2, color='g')
ax.grid(True)
fig.autofmt_xdate()
plt.show()
I have this dataframe:
The event-time is certain time, date-time column is every 10 min with a specific price. Continues for 4 hours after event time and 2 hours before the event for each security. I have thousands of securities. I want to create a plot that i x-axis starts from -12 to 24 which is event time to -2 hour to 4 hours after. y-axis price change. Is any way to synchronize date-time in python for security.
If you're looking to simply plot the data pandas should handle your datetimes for you assuming they are in datetime formats instead of strings.
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
event_time = pd.to_datetime('2024-04-28T07:52:00')
date_time = pd.date_range(event_time, periods=24, freq=pd.to_timedelta(10, 'minute'))
df = pd.DataFrame({'date_time': date_time, 'change': np.random.normal(size=len(date_time))})
ax = df.plot(x='date_time', y='change')
plt.show()
However if you're wanting to remove the specific times from the x axis and just count up from zero you could use the index as the x-axis:
df['_index'] = df.index
ax = df.plot(x='_index', y='change')
plt.show()
I want to import dataset from excel file to pandas dataframe and then plot it. Dataset includes Date column - which should be convert to pd.datetime and Stopwatch columns - which sould be convert to format: HH:MM:SS or HH:MM or H:MM depends from data (hour could be more than 24 hours and format shouldn't include date). Here are some rows from the data:
Date Stopwatch1 Stopwatch2 Stopwatch3 Timesum
01.08.2019 00:10:05 19:05 0:45 25:01:00
02.08.2019 00:08:00 23:50 0:30 30:30:00
03.08.2019 00:05:00 00:10 0:40 124:00:00
Then I want to plot Stopwatch column on y axis with labels in time format (HH:MM) and Date column on x axis. It would be nice if I could specify that for example if time < 06:00 : time = time + 24:00 - what I mean is 00:10 is greater than 23:50 in the Stopwatch2 column so that should be included in the chart.
I try to do:
df = pd.read_excel(path, dtype={'Stopwatch1':str, 'Stopwatch2':str, 'Stopwatch3':str})
tmd = pd.to_timedelta(df["Stopwatch1"])
tmd.plot()
Plot is working, but the labels on the y axis are numbers. I want to change them to time format(HH:MM).
Not sure if I understand the output correctly, but would this be something you are looking for?
I added the Stopwatch1 as a datetime so matplotlib dates module would understand and then help format it correctly:
df['Stopwatch1'] = pd.to_datetime(df["Stopwatch1"])
from matplotlib import dates as mdates
fig, ax = plt.subplots()
ax.plot(df['Stopwatch1'])
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
In case you also want the dates as the x-axis, you can set it as the index, so the distance between observations will be based on the date distance.
df['Date'] = pd.to_datetime(df["Date"])
df = df.set_index('Date')
fig, ax = plt.subplots()
ax.plot(df.index, df['Stopwatch1'])
ax.set_xticks(df.index)
ax.yaxis.set_major_formatter(mdates.DateFormatter("%H:%M"))
ax.xaxis.set_major_formatter(mdates.DateFormatter("%Y-%m-%d"))
If I have the following example Python code using a Pandas dataframe:
import pandas as pd
from datetime import datetime
ts = pd.DataFrame(randn(1000), index=pd.date_range('1/1/2000 00:00:00', freq='H', periods=1000), columns=['Data'])
ts['Time'] = ts.index.map(lambda t: t.time())
ts = ts.groupby('Time').mean()
ts.plot(x_compat=True, figsize=(20,10))
The output plot is:
What is the most elegant way to get the X-Axis ticks to automatically space themselves hourly or bi-hourly? x_compat=True has no impact
You can pass to ts.plot() the argument xticks. Giving the right interval you can plot hourly our bi-hourly like:
max_sec = 90000
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 3600))
ts.plot(x_compat=True, figsize=(20,10), xticks=arange(0, max_sec, 7200))
Here max_sec is the maximum value of the xaxis, in seconds.