How to apply series in isocalendar function in pandas python

How to apply series in isocalendar function in pandas python - python

I have the following table:
Time
2016-09-10T23:20:00.000000000
2016-08-10T23:20:00.000000000
2016-09-10T23:20:00.000000000
2017-09-10T23:20:00.000000000
2016-09-10T23:20:00.000000000
I wish to used isocalender to get the work weeks, so any ideas can share me?
Time WW
2016-01-01T23:20:00.000000000 201601
2016-01-01T23:20:00.000000000 201601
2016-01-01T23:20:00.000000000 201601
2017-01-01T23:20:00.000000000 201701
2018-01-01T23:20:00.000000000 201801

You can use:
#convert column to datetime
df['Time'] = pd.to_datetime(df['Time'])
#simplier solution with strftime
df['WW'] = df['Time'].dt.strftime('%G-%V')
#solution with isocalendar
df['WW1'] = df['Time'].apply(lambda x: str(x.isocalendar()[0]) + '-' +
str(x.isocalendar()[1]).zfill(2))
print (df)
Time WW WW1
0 2017-01-01 00:00:00 2016-52 2016-52 <- changed datetime
1 2016-08-10 23:20:00 2016-32 2016-32
2 2016-09-10 23:20:00 2016-36 2016-36
3 2017-09-10 23:20:00 2017-36 2017-36
4 2016-09-10 23:20:00 2016-36 2016-36
Thank you #Fierr for correct '%Y-%V' to '%G-%V'.

Related

Reading in Date / Time Values Correctly

Any ideas on how I can manipulate my current date-time data to make it suitable for use when converting the datatype to time?
For example:
df1['Date/Time'] = pd.to_datetime(df1['Date/Time'])
The current format for the data is mm/dd 00:00:00
an example of the column in the dataframe can be seen below.
Date/Time Dry_Temp[C] Wet_Temp[C] Solar_Diffuse_Rate[[W/m2]] \
0 01/01 00:10:00 8.45 8.237306 0.0
1 01/01 00:20:00 7.30 6.968360 0.0
2 01/01 00:30:00 6.15 5.710239 0.0
3 01/01 00:40:00 5.00 4.462898 0.0
4 01/01 00:50:00 3.85 3.226244 0.0

For the condition where the hour is denoted as 24, you have two choices. First you can simply reset the hour to 00 and second you can reset the hour to 00 and also add 1 to the date.
In either case the first step is detecting the condition which can be done with a simple find statement t.find(' 24:')
Having detected the condition in the first case it is a simple matter of reseting the hour to 00 and proceeding with the process of formatting the field. In the second case, however, adding 1 to the day is a little more complicated because of the fact you can roll over to next month.
Here is the approach I would use:
Given a df of form:
Date Time
0 01/01 00:00:00
1 01/01 00:24:00
2 01/01 24:00:00
3 01/31 24:00:00
The First Case
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate(x))
Produces the following:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-01 00:00:00
3 1900-01-31 00:00:00
For the second case, I employed the dateutil relativedelta library and slightly modified my parseDate funstion as shown below:
import dateutil as du
def parseDate2(tx):
ti = tx.find(' 24:')
if ti >= 0:
tk = pd.to_datetime(tx[:5]+' 00:'+tx[10:], format= '%m/%d %H:%M:%S')
return tk + du.relativedelta.relativedelta(hours=+24)
return pd.to_datetime(tx, format= '%m/%d %H:%M:%S')
df['Date Time'] = df['Date Time'].apply(lambda x: parseDate2(x))
Yields:
Date Time
0 1900-01-01 00:00:00
1 1900-01-01 00:24:00
2 1900-01-02 00:00:00
3 1900-02-01 00:00:00

To access the values of the datetime (namely the time), you can use:
# These are now in a usable format
seconds = df1['Date/Time'].dt.second
minutes = df1['Date/Time'].dt.minute
hours = df1['Date/Time'].dt.hours
And if need be, you can create its own independent time series with:
df1['Dat/Time'].dt.time

Why is the difference of datetime = zero for two rows in a dataframe?

This issue that I am facing is very simple yet weird and has troubled me to no end.
I have a dataframe as follows :
df['datetime'] = df['datetime'].dt.tz_convert('US/Pacific')
#converting datetime from datetime64[ns, UTC] to datetime64[ns,US/Pacific]
df.head()
vehicle_id trip_id datetime
6760612 1000500 4f874888ce404720a203e36f1cf5b716 2017-01-01 10:00:00-08:00
6760613 1000500 4f874888ce404720a203e36f1cf5b716 2017-01-01 10:00:01-08:00
6760614 1000500 4f874888ce404720a203e36f1cf5b716 2017-01-01 10:00:02-08:00
6760615 1000500 4f874888ce404720a203e36f1cf5b716 2017-01-01 10:00:03-08:00
6760616 1000500 4f874888ce404720a203e36f1cf5b716 2017-01-01 10:00:04-08:00
df.info ()
vehicle_id int64
trip_id object
datetime datetime64[ns, US/Pacific]
I am trying to find out the datatime difference as follows ( in two different ways) :
df['datetime_diff'] = df['datetime'].diff()
df['time_diff'] = (df['datetime'] - df['datetime'].shift(1)).astype('timedelta64[s]')
For a particular trip_id, I have the results as follows :
df[trip_frame['trip_id'] == '4f874888ce404720a203e36f1cf5b716'][['datetime','datetime_diff','time_diff']].head()
datetime datetime_diff time_diff
6760612 2017-01-01 10:00:00-08:00 NaT NaN
6760613 2017-01-01 10:00:01-08:00 00:00:01 1.0
6760614 2017-01-01 10:00:02-08:00 00:00:01 1.0
6760615 2017-01-01 10:00:03-08:00 00:00:01 1.0
6760616 2017-01-01 10:00:04-08:00 00:00:01 1.0
But for some other trip_ids like the below, you can observe that I am having the datetime difference as zero (for both the columns) when it is actually not.There is a time difference in seconds.
df[trip_frame['trip_id'] == '01b8a24510cd4e4684d67b96369286e0'][['datetime','datetime_diff','time_diff']].head(4)
datetime datetime_diff time_diff
3236107 2017-01-28 03:00:00-08:00 0 days 0.0
3236108 2017-01-28 03:00:01-08:00 0 days 0.0
3236109 2017-01-28 03:00:02-08:00 0 days 0.0
3236110 2017-01-28 03:00:03-08:00 0 days 0.0
df[df['trip_id'] == '01c2a70c25e5428bb33811ca5eb19270'][['datetime','datetime_diff','time_diff']].head(4)
datetime datetime_diff time_diff
8915474 2017-01-21 10:00:00-08:00 0 days 0.0
8915475 2017-01-21 10:00:01-08:00 0 days 0.0
8915476 2017-01-21 10:00:02-08:00 0 days 0.0
8915477 2017-01-21 10:00:03-08:00 0 days 0.0
Any leads as to what the actual issue is ? I will be very grateful.

If I just execute your code without the type conversion, everything looks fine:
df.timestamp - df.timestamp.shift(1)
On the example lines
rows=['2017-01-21 10:00:00-08:00',
'2017-01-21 10:00:01-08:00',
'2017-01-21 10:00:02-08:00',
'2017-01-21 10:00:03-08:00',
'2017-01-21 10:00:03-08:00'] # the above lines are from your example. I just invented this last line to have one equal entry
df= pd.DataFrame(rows, columns=['timestamp'])
df['timestamp']= df['timestamp'].astype('datetime64')
df.timestamp - df.timestamp.shift(1)
The last line returns
Out[40]:
0 NaT
1 00:00:01
2 00:00:01
3 00:00:01
4 00:00:00
Name: timestamp, dtype: timedelta64[ns]
That looks unsuspicious so far. Note, that you already have a timedelta64 series.
If I now add your conversion, I get:
(df.timestamp - df.timestamp.shift(1)).astype('timedelta64[s]')
Out[42]:
0 NaN
1 1.0
2 1.0
3 1.0
4 0.0
Name: timestamp, dtype: float64
You see, that the result is a series of floats. This is probably because there is a NaN in the series. One other thing is the additon [s]. This doesn't seem to work. If you use [ns] it seems to work. If you want to get rid of the nano seconds somehow, I guess you need to do it separately.

Shifting Date time index

I am trying to shift my date time index such that 2018-04-09 will show as 2018-04-08 one day ahead and only shifting the last row, I tried a few ways with different error such as below:
df.index[-1] = df.index[-1] + pd.offsets.Day(1)
TypeError: Index does not support mutable operations
Can you kindly advise a suitable way please?
My df looks like this:
FinalPosition
dt
2018-04-03 1.32
2018-04-04 NaN
2018-04-05 NaN
2018-04-06 NaN
2018-04-09 NaN

Use rename if values of DatetimeIndex are unique:
df = df.rename({df.index[-1]: df.index[-1] + pd.offsets.Day(1)})
print (df)
FinalPosition
dt
2018-04-03 1.32
2018-04-04 NaN
2018-04-05 NaN
2018-04-06 NaN
2018-04-10 NaN
If possible not unique for me working DatetimeIndex.insert:
df.index = df.index[:-1].insert(len(df), df.index[-1] + pd.offsets.Day(1))

Use .iloc
Ex:
import pandas as pd
df = pd.DataFrame({"datetime": ["2018-04-09"]})
df["datetime"] = pd.to_datetime(df["datetime"])
print df["datetime"].iloc[-1:] - pd.offsets.Day(1)
Output:
0 2018-04-08
Name: datetime, dtype: datetime64[ns]

Is possible to set a date when converting with pandas.to_datetime?

I have an array that looks like this:
array([(b'03:35:05.397191'),
(b'03:35:06.184700'),
(b'03:35:08.642503'), ...,
(b'05:47:15.285806'),
(b'05:47:20.189460'),
(b'05:47:30.598514')],
dtype=[('Date', 'S15')])
I want to convert it into a dataframe, using to_datetime. I could do that by simply doing this:
df = pd.DataFrame( array )
df['Date'] = pd.to_datetime( df['Date'].str.decode("utf-8") )
>>> df.Date
0 2018-03-07 03:35:05.397191
1 2018-03-07 03:35:06.184700
2 2018-03-07 03:35:08.642503
3 2018-03-07 03:35:09.155030
4 2018-03-07 03:35:09.300029
5 2018-03-07 03:35:09.303031
The problem is that it automatically sets the date as today. Is it possible to set the date as a different day, for example, 2015-01-25?

Instead of using pd.to_datetime, use pd.to_timedelta and add a date.
pd.to_timedelta(df.Date.str.decode("utf-8")) + pd.to_datetime('2017-03-15')
0 2017-03-15 03:35:05.397191
1 2017-03-15 03:35:06.184700
2 2017-03-15 03:35:08.642503
3 2017-03-15 05:47:15.285806
4 2017-03-15 05:47:20.189460
5 2017-03-15 05:47:30.598514
Name: Date, dtype: datetime64[ns]

Try this:
df['Date'] = pd.to_datetime( df['Date'].str.decode("utf-8") ).apply(lambda x: x.replace(year=2015, month=1, day=25))
Incorporating #Wen's solution for correctness :)

you could create a string with complete date-time and parse, like:
df = pd.DataFrame( array )
df['Date'] = pd.to_datetime( '20150125 ' + df['Date'].str.decode("utf-8") )

Ummm, seems like it work :-)
pd.to_datetime(df['Date'].str.decode("utf-8"))-(pd.to_datetime('today')-pd.to_datetime('2015-01-25'))
Out[376]:
0 2015-01-25 03:35:05.397191
1 2015-01-25 03:35:06.184700
2 2015-01-25 03:35:08.642503
3 2015-01-25 05:47:15.285806
4 2015-01-25 05:47:20.189460
5 2015-01-25 05:47:30.598514
Name: Date, dtype: datetime64[ns]

Pandas and csv import into dataframe. How to best to combine date anbd date fields into one

I have a csv file that I am trying to import into pandas.
There are two columns of intrest. date and hour and are the first two cols.
E.g.
date,hour,...
10-1-2013,0,
10-1-2013,0,
10-1-2013,0,
10-1-2013,1,
10-1-2013,1,
How do I import using pandas so that that hour and date is combined or is that best done after the initial import?
df = DataFrame.from_csv('bingads.csv', sep=',')
If I do the initial import how do I combine the two as a date and then delete the hour?
Thanks

Define your own date_parser:
In [291]: from dateutil.parser import parse
In [292]: import datetime as dt
In [293]: def date_parser(x):
.....: date, hour = x.split(' ')
.....: return parse(date) + dt.timedelta(0, 3600*int(hour))
In [298]: pd.read_csv('test.csv', parse_dates=[[0,1]], date_parser=date_parser)
Out[298]:
date_hour a b c
0 2013-10-01 00:00:00 1 1 1
1 2013-10-01 00:00:00 2 2 2
2 2013-10-01 00:00:00 3 3 3
3 2013-10-01 01:00:00 4 4 4
4 2013-10-01 01:00:00 5 5 5

Apply read_csv instead of read_clipboard to handle your actual data:
>>> df = pd.read_clipboard(sep=',')
>>> df['date'] = pd.to_datetime(df.date) + pd.to_timedelta(df.hour, unit='D')/24
>>> del df['hour']
>>> df
date ...
0 2013-10-01 00:00:00 NaN
1 2013-10-01 00:00:00 NaN
2 2013-10-01 00:00:00 NaN
3 2013-10-01 01:00:00 NaN
4 2013-10-01 01:00:00 NaN
[5 rows x 2 columns]

Take a look at the parse_dates argument which pandas.read_csv accepts.
You can do something like:
df = pandas.read_csv('some.csv', parse_dates=True)
# in which case pandas will parse all columns where it finds dates
df = pandas.read_csv('some.csv', parse_dates=[i,j,k])
# in which case pandas will parse the i, j and kth columns for dates

Since you are only using the two columns from the cdv file and combining those into one, I would squeeze into a series of datetime objects like so:
import pandas as pd
from StringIO import StringIO
import datetime as dt
txt='''\
date,hour,A,B
10-1-2013,0,1,6
10-1-2013,0,2,7
10-1-2013,0,3,8
10-1-2013,1,4,9
10-1-2013,1,5,10'''
def date_parser(date, hour):
dates=[]
for ed, eh in zip(date, hour):
month, day, year=list(map(int, ed.split('-')))
hour=int(eh)
dates.append(dt.datetime(year, month, day, hour))
return dates
p=pd.read_csv(StringIO(txt), usecols=[0,1],
parse_dates=[[0,1]], date_parser=date_parser, squeeze=True)
print p
Prints:
0 2013-10-01 00:00:00
1 2013-10-01 00:00:00
2 2013-10-01 00:00:00
3 2013-10-01 01:00:00
4 2013-10-01 01:00:00
Name: date_hour, dtype: datetime64[ns]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to apply series in isocalendar function in pandas python - python

Related

Reading in Date / Time Values Correctly

Why is the difference of datetime = zero for two rows in a dataframe?

Shifting Date time index

Is possible to set a date when converting with pandas.to_datetime?

Pandas and csv import into dataframe. How to best to combine date anbd date fields into one

Categories

Resources