adding new values to pandas df and increment timestamp - python

I have a time series dataset of a Pandas series df that I am trying to add a new value to the bottom of the df and then increment the timestamp which is the df index.
For example the new value I can add to the bottom of the df like this:
testday.loc[len(testday.index)] = testday_predict[0]
print(testday)
Which seems to work but the time stamp is just incremented:
kW
Date
2022-07-29 00:00:00 39.052800
2022-07-29 00:15:00 38.361600
2022-07-29 00:30:00 38.361600
2022-07-29 00:45:00 38.534400
2022-07-29 01:00:00 38.880000
... ...
2022-07-29 23:00:00 36.806400
2022-07-29 23:15:00 36.806400
2022-07-29 23:30:00 36.633600
2022-07-29 23:45:00 36.806400
96 44.482361 <---- my predicted value added at the bottom good except for the time stamp value of 96
Like the value of 96 is just the next value in the length of the df.index hopefully this makes sense.
If I try:
from datetime import timedelta
last_index_stamp = testday.last_valid_index()
print(last_index_stamp)
This returns:
Timestamp('2022-07-29 23:45:00')
And then I can add 15 minutes to this Timestamp (my data is 15 minute data) like this:
new_timestamp = last_index_stamp + timedelta(minutes=15)
print(new_timestamp)
Which returns what I am looking instead of the value of 96:
Timestamp('2022-07-30 00:00:00')
But how do I replace the value of 96 with new_timestampt? If I try:
testday.index[-1:] = new_timestamp
This will error out:
TypeError: Index does not support mutable operations
Any tips greatly appreciated...

This should do the trick:
testday.loc[new_timestamp,:] = testday_predict[0]

Related

Time Series Data Reformat

I am working on some code that will rearrange a time series. Currently I have a standard time series. I have a three columns with with the header being [Date, Time, Value]. I want to reformat the dataframe to index with the date and use a header with the time (i.e. 0:00, 1:00, ... , 23:00). The dataframe will be filled in with the value.
Here is the Dataframe currently have
essentially I'd like to mve the index toa single day and show the hours through the columns.
Thanks,
Use pivot:
df = df.pivot(index='Date', columns='Time', values='Total')
Output (first 10 columns and with random values for Total):
>>> df.pivot(index='Date', columns='Time', values='Total').iloc[0:10]
time 00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00
date
2019-01-01 0.732494 0.087657 0.930405 0.958965 0.531928 0.891228 0.664634 0.432684 0.009653 0.604878
2019-01-02 0.471386 0.575126 0.509707 0.715290 0.337983 0.618632 0.413530 0.849033 0.725556 0.186876
You could try this.
Split the time part to get only the hour. Add hr to it.
df = pd.DataFrame([['2019-01-01', '00:00:00',-127.57],['2019-01-01', '01:00:00',-137.57],['2019-01-02', '00:00:00',-147.57],], columns=['Date', 'Time', 'Totals'])
df['hours'] = df['Time'].apply(lambda x: 'hr'+ str(int(x.split(':')[0])))
print(pd.pivot_table(df, values ='Totals', index=['Date'], columns = 'hours'))
Output
hours hr0 hr1
Date
2019-01-01 -127.57 -137.57
2019-01-02 -147.57 NaN

Converting columns with hours to datetime type pandas

I try to convert my column with "time" in the form "hr hr: min min :sec sec" in my pandas frame from object to date time 64 as I want to filter for hours.
I tried new['Time'] = pd.to_datetime(new['Time'], format='%H:%M:%S').dt.time which has no effect at all (it is still an object).
I also tried new['Time'] = pd.to_datetime(new['Time'],infer_datetime_format=True)
which gets the error message: TypeError: <class 'datetime.time'> is not convertible to datetime
I want to be able to sort my data frame for hours.
How do i convert the object to the hour?
can I then filter by hour (for example everything after 8am) or do I have to enter the exact value with minutes and seconds to filter for it?
Thank you
If you want your df['Time'] to be of type datetime64 just use
df['Time'] = pd.to_datetime(df['Time'], format='%H:%M:%S')
print(df['Time'])
This will result in the following column
0 1900-01-01 00:00:00
1 1900-01-01 00:01:00
2 1900-01-01 00:02:00
3 1900-01-01 00:03:00
4 1900-01-01 00:04:00
...
1435 1900-01-01 23:55:00
1436 1900-01-01 23:56:00
1437 1900-01-01 23:57:00
1438 1900-01-01 23:58:00
1439 1900-01-01 23:59:00
Name: Time, Length: 1440, dtype: datetime64[ns]
If you just want to extract the hour from the timestamp extent pd.to_datetime(...) by .dt.hour
If you want to group your values on an hourly basis you can also use (after converting the df['Time'] to datetime):
new_df = df.groupby(pd.Grouper(key='Time', freq='H'))['Value'].agg({pd.Series.to_list})
This will return all values grouped by hour.
IIUC, you already have a time structure from datetime module:
Suppose this dataframe:
from datetime import time
df = pd.DataFrame({'Time': [time(10, 39, 23), time(8, 47, 59), time(9, 21, 12)]})
print(df)
# Output:
Time
0 10:39:23
1 08:47:59
2 09:21:12
Few operations:
# Check if you have really `time` instance
>>> df['Time'].iloc[0]
datetime.time(10, 39, 23)
# Sort values by time
>>> df.sort_values('Time')
Time
1 08:47:59
2 09:21:12
0 10:39:23
# Extract rows from 08:00 and 09:00
>>> df[df['Time'].between(time(8), time(9))]
Time
1 08:47:59

Dealing with time objects in pandas python

I am working with a Pandas Series that contains (Date/Time) Strings of the form:
"2020-04-01 09:29:21"-"2020-04-01 09:53:17"-"2020-04-13 09:55:55"-.....).
The format is : "yyyy-mm-dd H:M:s".
I am only interested in the hour and minute components and I am looking for a way to divide the data into 30 minute buckets and count the values in each bucket.
An example of my end result:
Range count
9:00-9:30 7
9:30-10:00 25
10:00-10:30 35.......
You need to resample first and then do a groupby the time. Lets us create a serie and set the index to DateTimeIndex otherwise resample won't work:
# random data
np.random.seed(0)
serie = pd.Series(
np.random.choice(pd.date_range(
'2020-01-01', freq='7T22S', periods=10000), 1000)
)
serie.index = serie
Do a resample and then do a groupby:
res = serie.resample('30T').count()
results = res.groupby(res.index.time).sum()
#Change the index to match the format
results.index = results.index.astype(str) + ' - ' +\
np.roll(results.index.astype(str), -1)
results.head()
# 00:00:00 - 00:30:00 19
# 00:30:00 - 01:00:00 25
# 01:00:00 - 01:30:00 19
# 01:30:00 - 02:00:00 28
# 02:00:00 - 02:30:00 22

Is there a function to increment timestamp in Python

We're working with python on an ubuntu 18.04 server, and is storing real time data from temperature sensors on a MySQL database. The database is installed on our server.
What we want to do is to increment a timestamp, where we retrieve the latest value in a 20 min interval, which means that in every 20 min we retrieve the latest temperature value from the sensor, from the MySQL database. We only want the interval to be from .0, 0.20, 0.40.
Example of the incrementing
2019-07-26 00:00:00
2019-07-26 00:20:00
2019-07-26 00:40:00
2019-07-26 01:00:00
...
2019-07-26 23:40:00
2019-07-27 00:00:00
...
2019-07-30 23:40:00
2019-08-01 00:00:00
This is the basic idea of what we want to achieve, but we know this a very bad way of coding thisWe want a more dynamically code. We're imagining that there's a function perhaps, or some other way we haven't thought about. This is what the basic idea looks like:
for x in range (0, 24, 1)
for y in range (0, 60, 20)
a = pd.read_sql('SELECT temperature1m FROM Weather_Station WHERE timestamp > "2019-07-26 %d:%d:00" AND timestamp < "2019-07-26 %d:%d:00" ORDER BY timestamp DESC LIMIT 1' % (x, y, x, y+20), conn).astype(float).values
On our database we can retrieve first and last timestamp on our sensor.
lastLpnTime = pd.read_sql('SELECT MAX(timestamp) FROM Raw_Data WHERE topic = "lpn1"', conn).astype(str).values
firstLpnTime = pd.read_sql('SELECT MIN(timestamp) FROM Raw_Data WHERE topic = "lpn1"', conn).astype(str).values
Therefore we imagine that we can say:
From firstLpnTime to lastLpnTime in a 20 min interval from .00, 0.20 or 0.40 do this retrieve data from the MySQL database
but how we do this?
If you load the data in pandas dataframe you can sample them in the desired time periods using pd.resample .
if you want to increment your timestamp you can do something like:
from datetime import datetime, timedelta
your_start_date = '2019-07-26 00:00:00'
date = datetime.strptime(your_start_date, '%Y-%m-%d %H:%M:%S')
for i in range(10):
print(date.strftime('%Y-%m-%d %H:%M:%S'))
date += increment
output:
# 2019-07-26 00:00:00
# 2019-07-26 00:20:00
# 2019-07-26 00:40:00
# 2019-07-26 01:00:00
# 2019-07-26 01:20:00
# 2019-07-26 01:40:00
# 2019-07-26 02:00:00
# 2019-07-26 02:20:00
# 2019-07-26 02:40:00
# 2019-07-26 03:00:00

Python Pandas sizeof times

I am working in a dataframe in Pandas that looks like this.
Identifier datetime
0 AL011851 00:00:00
1 AL011851 06:00:00
2 Al011851 12:00:00
This is my code so far:
import pandas as pd
hurricane_df = pd.read_csv("hurdat2.csv",parse_dates=['datetime'])
hurricane_df['datetime'] = pd.to_timedelta(hurricane_df['datetime'].dt.strftime('%H:%M:%S'))
hurricane_df
grouped = hurricane_df.groupby('datetime').size()
grouped
What I did was convert the datetime column to a timedelta to get the hours. I want to get the size of the datetime column but I want just hours like 1:00, 2:00, 3:00, etc. but I get minute intervals as well like 1:15 and 2:45.
Any way to just display the hour?
Thank you.
You can use pandas.Timestamp.round with Series.dt shortcut:
df['datetime'] = df['datetime'].dt.round('h')
So
... datetime
01:15:00
02:45:00
becomes
... datetime
01:00:00
03:00:00
df = pd.DataFrame({'Identifier':['AL011851','AL011851','AL011851'],'datetime': ["2018-12-08 16:35:23","2018-12-08 14:20:45", "2018-12-08 11:45:00"]})
df['datetime'] = pd.to_datetime(df['datetime'])
df
Identifier datetime
0 AL011851 2018-12-08 16:35:23
1 AL011851 2018-12-08 14:20:45
2 AL011851 2018-12-08 11:45:00
# Rounds to nearest hour
def roundHour(t):
return (t.replace(second=0, microsecond=0, minute=0, hour=t.hour)
+timedelta(hours=t.minute//30))
df.datetime=df.datetime.map(lambda t: roundHour(t)) # Step 1: Round to nearest hour
df.datetime=df.datetime.map(lambda t: t.strftime('%H:%M')) # Step 2: Remove seconds
df
Identifier datetime
0 AL011851 17:00
1 AL011851 14:00
2 AL011851 12:00

Categories