I have solar data in the format:
I want to convert the time index to hour angle using the pvlib package. So far my code reads in the input data from a .csv file using pandas and extracts the time data. I need to convert this data (in 30-minute intervals) into hour angles, but I keep getting the error:
TypeError: index is not a valid DatetimeIndex or PeriodIndex
Here is my code so far:
# Import modules
import pandas as pd
import pvlib
# Read in data from .csv file for time and DHI
headers = ["Time","DHI"]
data_file = pd.read_csv("path to csv file",names=headers)
time_data = data_file["Time"]
# Find equation of time for hour angle calc
equation_of_time = pvlib.solarposition.equation_of_time_spencer71(1)
# Find hour angle
hour_angle = pvlib.solarposition.hour_angle(time_data, -89.401230, equation_of_time)
As the error message states, the issue is that your index is not a DateTimeIndex. To calculate the hour angle we need to know the specific time, hence the need for a DateTimeIndex. Right now you are simply passing in a list of integers, which doesn't have any meaning to the function.
Let's first create a small example:
import pandas as pd
import pvlib
df = pd.DataFrame(data={'time': [0,570,720], 'DHI': [0,50,100]})
df.head()
time DHI
0 0 0
1 570 50
2 720 100
# Create a DateTimeIndex:
start_date = pd.Timestamp(2020,7,28).tz_localize('Europe/Copenhagen')
df.index = start_date + pd.to_timedelta(df['time'], unit='min')
Now the DataFrame has looks like this:
time DHI
time
2020-07-28 00:00:00+02:00 0 0
2020-07-28 09:30:00+02:00 570 50
2020-07-28 12:00:00+02:00 720 100
Now we can pass the index to the hour angle function as it represents unique time periods:
equation_of_time = pvlib.solarposition.equation_of_time_spencer71(df.index.dayofyear)
# Find hour angle
hour_angle = pvlib.solarposition.hour_angle(df.index, -89.401230,
equation_of_time)
Note how the start date was localized to a specific time zone. This is necessary unless that you're data is UTC as otherwise the index does not represent unique time periods.
The answer by #Adam_Jensen is good, but not the simplest. If you look at the code for the hour_angle function, you will see that 2/3 of it is devoted to turning those timestamps back into integers. The rest is so easy you do not need pvlib.
# hour_angle and equation_of_time are defined in the question
LONGITUDE = -89.401230
LOCAL_MERIDIAN = -90
hour_angle = (LONGITUDE - LOCAL_MERIDIAN) + (time_data - 720 + equation_of_time) / 4
It's always good to understand what's going on under the hood!
Related
I have rain data and sensor data that is collected on 15min intervals. What I want to do is only collect sensor data 72 hours after the last rain drop has fallen. If rain is observed between that time, the counter resets until 72 hours dry time is observed.
I converted the data to timestamp data but can't figure out the logic for the above. Links to example data as well as example tables below.
Timestamp
Precipitation(mm)
2021-04-01 00:15
6
2021-04-01 00:30
0
Timestamp
Sensor Depth (mm)
2021-04-01 00:15
12
2021-04-01 00:30
4
example rain data
example sensor data
import pandas as pd
import matplotlib.pyplot as plt
import os
from datetime import datetime, date, time
file = pd.read_csv('example_sensor.csv')
rain = pd.read_csv('example_rain.csv')
east1_df = pd.DataFrame(file)
east1_df['Timestamp'] = pd.to_datetime(east1_df['Timestamp'], format='%Y-%m-%d %H:%M')
east1_df.index=east1__df['Timestamp']
rain['Timestamp'] = pd.to_datetime(rain['Timestamp'], format='%Y-%m-%d %H:%M')
rain.index=rain['Timestamp']#pd.DatetimeIndex([east1_spring_df['Timestamp']], dtype='datetime64[ns]', freq=None)
I am not aware of a pandas functionality to achieve this.
However, there is a way to do this with numpy. You would just need to extract the data from the dataframe.
Using a boxcar function one can filter for events which span a certain period by convolving it with the rainfall data.
Here's a minimal example on how to achieve this using numpy:
import numpy as np
from datetime import datetime, timedelta
def datetime_range(start, end, delta):
result = []
current = start
while current < end:
result.append(current)
current += delta
return result
def create_boxcar(dry_hours, delta_minutes):
n_dry = dry_hours * 60 // delta_minutes
return np.ones(n_dry) / n_dry
def create_data(delta_minutes):
stamps = np.array(datetime_range(datetime(2022, 2, 23), datetime(2022, 2, 28), timedelta(minutes=delta_minutes)))
rainfall = np.random.randn(len(stamps))-1 # shifted normal distribution
rainfall[rainfall < 0] = 0 # coerce negative values to zero
sensor = np.arange(len(stamps)) # just a ramp
return stamps, rainfall, sensor
delta_minutes = 15
stamps, rainfall, sensor = create_data(delta_minutes)
# get dry regions
no_rainfall = (rainfall == 0).astype(int)
# create boxcar filter with desired length
dry_hours_before_read = 3
box_filter = create_boxcar(dry_hours_before_read, delta_minutes)
# get regions with desired dry period:
# Convolve boxcar and data, look for a result of 1,
# i.e full overlap of boxcar and no_rainfall
readout_region = np.convolve(no_rainfall, box_filter, 'same') == 1
# get timestamps and values during dry period
timestamp_dry_enough = stamps[readout_region]
sensor_dry_enough = sensor[readout_region]
After that manipulation, you could feed that information back to the dataframe for further pandas-based filtering:
east1_df[f'no rain for {dry_hours_before_read} hours'] = readout_region
I have a 40 year time series in the format stn;yyyymmddhh;rainfall , where yyyy= year, mm = month, dd= day,hh= hour. The series is at an hourly resolution. I extracted the maximum values for each year by the following groupby method:
import pandas as pd
df = pd.read_csv('data.txt', delimiter = ";")
df['yyyy'] = df['yyyymmhhdd'].astype(str).str[:4]
df.groupby(['yyyy'])['rainfall'].max().reset_index()
Now, i am trying to extract the maximum values for 3 hour duration each year. I tried this sliding maxima approach but it is not working. k is the duration I am interested in. In simple words,i need maximum precipitation sum for multiple durations in every year (eg 3h, 6h, etc)
class AMS:
def sliding_max(self, k, data):
tp = data.values
period = 24*365
agg_values = []
start_j = 1
end_j = k*int(np.floor(period/k))
for j in range(start_j, end_j + 1):
start_i = j - 1
end_i = j + k + 1
agg_values.append(np.nansum(tp[start_i:end_i]))
self.sliding_max = max(agg_values)
return self.sliding_max
Any suggestions or improvements in my code or is there a way i can implement it with groupby. I am a bit new to python environment, so please excuse if the question isn't put properly.
Stn;yyyymmddhh;rainfall
xyz;1981010100;0.0
xyz;1981010101;0.0
xyz;1981010102;0.0
xyz;1981010103;0.0
xyz;1981010104;0.0
xyz;1981010105;0.0
xyz;1981010106;0.0
xyz;1981010107;0.0
xyz;1981010108;0.0
xyz;1981010109;0.4
xyz;1981010110;0.6
xyz;1981010111;0.1
xyz;1981010112;0.1
xyz;1981010113;0.0
xyz;1981010114;0.1
xyz;1981010115;0.6
xyz;1981010116;0.0
xyz;1981010117;0.0
xyz;1981010118;0.2
xyz;1981010119;0.0
xyz;1981010120;0.0
xyz;1981010121;0.0
xyz;1981010122;0.0
xyz;1981010123;0.0
xyz;1981010200;0.0
You first have to convert your column containing the datetimes to a Series of type datetime. You can do that parsing by providing the format of your datetimes.
df["yyyymmddhh"] = pd.to_datetime(df["yyyymmddhh"], format="%Y%M%d%H")
After having the correct data type you have to set that column as your index and can now use pandas functionality for time series data (resampling in your case).
First you resample the data to 3 hour windows and sum the values. From that you resample to yearly data and take the maximum value of all the 3 hour windows for each year.
df.set_index("yyyymmddhh").resample("3H").sum().resample("Y").max()
# Output
yyyymmddhh rainfall
1981-12-31 1.1
One of the columns in my dataset looks like the following:
I'm wondering what the best practice is to incorporate this information into the classifier models.
Please help me with it. The project is done in python with the Jupyter Notebook.
You can translate string to seconds and use it as one of the features:
df = pd.DataFrame(['25m 34s', '1m', '22s'], columns=['game_length'])
df['game_length_seconds'] = pd.to_timedelta(df['game_length']).apply(lambda x: x.seconds)
>>> df
game_length game_length_seconds
0 25m 34s 1534
1 1m 60
2 22s 22
You can write a lambda function to map that column:
def duration_from_epoc(time:str):
# converts string to datetime and in this case the following line would result in 1900-01-01 00:25:28
date_time = datetime.strptime(time, "%Mm %Ss")
# calculate duration from epoc / start datetime. here you can CHANGE your starting time
date_time_from = datetime(1970,1,1)
# returns delta total seconds
return (date_time - date_time_from).total_seconds()
Then you can call,
df["game_length"].map(duration_from_epoc)
For example, the result of '25m 28s' (str) would be -2208987272.0 (float)
Overall, the solution is calculating seconds from a standard date time to your date time (calculated from string form of duration). Caution that "25m 28s" converts to 1900-01-01 00:25:28 datetime object.
At the end, I would say that instead of saving duration, try to save start and end time of each game play so that you can always calculate duration on the go.
I just started moving from Matlab to Python 2.7 and I have some trouble reading my .mat-files. Time information is stored in Matlab's datenum format. For those who are not familiar with it:
A serial date number represents a calendar date as the number of days that has passed since a fixed base date. In MATLAB, serial date number 1 is January 1, 0000.
MATLAB also uses serial time to represent fractions of days beginning at midnight; for example, 6 p.m. equals 0.75 serial days. So the string '31-Oct-2003, 6:00 PM' in MATLAB is date number 731885.75.
(taken from the Matlab documentation)
I would like to convert this to Pythons time format and I found this tutorial. In short, the author states that
If you parse this using python's datetime.fromordinal(731965.04835648148) then the result might look reasonable [...]
(before any further conversions), which doesn't work for me, since datetime.fromordinal expects an integer:
>>> datetime.fromordinal(731965.04835648148)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: integer argument expected, got float
While I could just round them down for daily data, I actually need to import minutely time series. Does anyone have a solution for this problem? I would like to avoid reformatting my .mat files since there's a lot of them and my colleagues need to work with them as well.
If it helps, someone else asked for the other way round. Sadly, I'm too new to Python to really understand what is happening there.
/edit (2012-11-01): This has been fixed in the tutorial posted above.
You link to the solution, it has a small issue. It is this:
python_datetime = datetime.fromordinal(int(matlab_datenum)) + timedelta(days=matlab_datenum%1) - timedelta(days = 366)
a longer explanation can be found here
Using pandas, you can convert a whole array of datenum values with fractional parts:
import numpy as np
import pandas as pd
datenums = np.array([737125, 737124.8, 737124.6, 737124.4, 737124.2, 737124])
timestamps = pd.to_datetime(datenums-719529, unit='D')
The value 719529 is the datenum value of the Unix epoch start (1970-01-01), which is the default origin for pd.to_datetime().
I used the following Matlab code to set this up:
datenum('1970-01-01') % gives 719529
datenums = datenum('06-Mar-2018') - linspace(0,1,6) % test data
datestr(datenums) % human readable format
Just in case it's useful to others, here is a full example of loading time series data from a Matlab mat file, converting a vector of Matlab datenums to a list of datetime objects using carlosdc's answer (defined as a function), and then plotting as time series with Pandas:
from scipy.io import loadmat
import pandas as pd
import datetime as dt
import urllib
# In Matlab, I created this sample 20-day time series:
# t = datenum(2013,8,15,17,11,31) + [0:0.1:20];
# x = sin(t)
# y = cos(t)
# plot(t,x)
# datetick
# save sine.mat
urllib.urlretrieve('http://geoport.whoi.edu/data/sine.mat','sine.mat');
# If you don't use squeeze_me = True, then Pandas doesn't like
# the arrays in the dictionary, because they look like an arrays
# of 1-element arrays. squeeze_me=True fixes that.
mat_dict = loadmat('sine.mat',squeeze_me=True)
# make a new dictionary with just dependent variables we want
# (we handle the time variable separately, below)
my_dict = { k: mat_dict[k] for k in ['x','y']}
def matlab2datetime(matlab_datenum):
day = dt.datetime.fromordinal(int(matlab_datenum))
dayfrac = dt.timedelta(days=matlab_datenum%1) - dt.timedelta(days = 366)
return day + dayfrac
# convert Matlab variable "t" into list of python datetime objects
my_dict['date_time'] = [matlab2datetime(tval) for tval in mat_dict['t']]
# print df
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 201 entries, 2013-08-15 17:11:30.999997 to 2013-09-04 17:11:30.999997
Data columns (total 2 columns):
x 201 non-null values
y 201 non-null values
dtypes: float64(2)
# plot with Pandas
df = pd.DataFrame(my_dict)
df = df.set_index('date_time')
df.plot()
Here's a way to convert these using numpy.datetime64, rather than datetime.
origin = np.datetime64('0000-01-01', 'D') - np.timedelta64(1, 'D')
date = serdate * np.timedelta64(1, 'D') + origin
This works for serdate either a single integer or an integer array.
Just building on and adding to previous comments. The key is in the day counting as carried out by the method toordinal and constructor fromordinal in the class datetime and related subclasses. For example, from the Python Library Reference for 2.7, one reads that fromordinal
Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. ValueError is raised unless 1 <= ordinal <= date.max.toordinal().
However, year 0 AD is still one (leap) year to count in, so there are still 366 days that need to be taken into account. (Leap year it was, like 2016 that is exactly 504 four-year cycles ago.)
These are two functions that I have been using for similar purposes:
import datetime
def datetime_pytom(d,t):
'''
Input
d Date as an instance of type datetime.date
t Time as an instance of type datetime.time
Output
The fractional day count since 0-Jan-0000 (proleptic ISO calendar)
This is the 'datenum' datatype in matlab
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence an increase of 366 days, for year 0 AD was a leap year
'''
dd = d.toordinal() + 366
tt = datetime.timedelta(hours=t.hour,minutes=t.minute,
seconds=t.second)
tt = datetime.timedelta.total_seconds(tt) / 86400
return dd + tt
def datetime_mtopy(datenum):
'''
Input
The fractional day count according to datenum datatype in matlab
Output
The date and time as a instance of type datetime in python
Notes on day counting
matlab: day one is 1 Jan 0000
python: day one is 1 Jan 0001
hence a reduction of 366 days, for year 0 AD was a leap year
'''
ii = datetime.datetime.fromordinal(int(datenum) - 366)
ff = datetime.timedelta(days=datenum%1)
return ii + ff
Hope this helps and happy to be corrected.
I am working with a csv file of hourly temperature data for a specific location for a year. From there I made a dataframe in pandas with the following lists: DOY (24 for every day), Time (in minutes; 0, 60, 120, etc.), and temperature.
These data are serving as input data for a model that I have in Python (Jupyter notebooks, running on iOS) that predicts animal body temperatures when solving for a bunch of biophysical heat flux equations. For that model, I have a function that has the arguments of min and max temperature for every day. In the csv file, every day has 24 rows of data since they're giving hourly temperatures. I need to be able to iterate through this csv file and select the minimum and maximum temperature value for the day before current day [i-1], the current day [i], and the following day [i+1] in another function that I already have. Does anyone have suggestions for how to set up those functions? I'm still rather new to Python (< 1 year experience) so any help would be really appreciated! :)
Edit to clarify:
import math
import itertools
%pylab inline
import matplotlib as plt
import pandas as pd
import numpy as np
%cd "/Users/lauren/Desktop"
Ta_input = pd.DataFrame(pd.read_csv("28Jan_Mesa_Ta.csv"))
Ta_input.columns = ['', 'doy', 'time', 'Ta']
Ta_input.to_numpy()
doy =list(Ta_input['doy'])
time=list(Ta_input['time'])
Ta=list(Ta_input['Ta'])
micro_df = pd.DataFrame(list(zip(doy,time, Ta)),
columns=['doy','time', 'Ta'])
print(micro_df)
##### Below is the readout showing what the df looks like ###
Populating the interactive namespace from numpy and matplotlib
/Users/lauren/Desktop
doy time Ta
0 1 0 4.434094
1 1 60 4.383863
2 1 120 4.115001
3 1 180 3.831146
4 1 240 3.537708
... ... ... ...
8755 365 1140 6.478684
8756 365 1200 5.744720
8757 365 1260 5.212801
8758 365 1320 4.568695
8759 365 1380 4.398663
[8760 rows x 3 columns]
/usr/local/Caskroom/miniconda/base/lib/python3.7/site-packages/IPython/core/magics/pylab.py:160: UserWarning: pylab import has clobbered these variables: ['time', 'polyint', 'plt', 'insert']
`%matplotlib` prevents importing * from pylab and numpy
"\n`%matplotlib` prevents importing * from pylab and numpy"
I have these functions
def anytime_temp(t,max_t_yesterday,min_t_today,max_t_today,min_t_tomorrow):
#t = time
#i = today
#i-1 = yesterday
#i+1 = tomorrow
#Tn = daily min
#Tx = daily max
if 0.<=t<=5.:
return max_t_yesterday*Gamma_t(t) + min_t_today*(1-Gamma_t(t))
elif 5.<t<=14.:
return max_t_today*Gamma_t(t)+ min_t_today*(1-Gamma_t(t))
else:
return max_t_today*Gamma_t(t)+ min_t_tomorrow*(1-Gamma_t(t))
# Rabs = amount of radiation absorbed by an organism
def Rabs(s,alpha_s,h,lat,J,time,long,d,tau,alt,rg,alpha_l,max_t_yesterday,min_t_today,max_t_today,min_t_tomorrow,eg,Tave,amp,z,D):
if math.cos(zenith(latitude,julian,time,longitude)) > 0.:
return s*alpha_s*(Fh(h,lat,J,time,long,d)*(hS(J,lat,time,long,tau,alt))+0.5*Sd(J,lat,time,long,tau,alt)+0.5*Sr(rg,J,lat,time,long,tau,alt)) + 0.5*alpha_l*(Sla(anytime_temp(time,max_t_yesterday,min_t_today,max_t_today,min_t_tomorrow)) + Slg(eg,Tzt(Tave,amp,z,D,time)))
else:
return 0.5*alpha_l*(Sla(anytime_temp(time,max_t_yesterday,min_t_today,max_t_today,min_t_tomorrow)) + Slg(eg,Tzt(Tave,amp,z,D,time)))
...which both take maximum temperature yesterday, minimum and maximum temperature today, and minimum temperature tomorrow as inputs. My code is set up to run with me just setting those values to be single numbers (e.g., min_t_today = 25.). But now that I have a list of hourly temperatures for the entire year, I am trying to figure out the best way to either modify these functions, or define new functions that I could call in these functions to allow me to pull the minimum and maximum temperature values for each specific DOY (day of year, which is another list in my df).
In other words, my csv file has hourly temperatures for every DOY, so 24 temps per day. I need to iterate through to calculate and call on the min and max temperatures for a given day in these functions. Any tips would be helpful! Thanks!