How do I make a column with iso weeks? - python

I have a data frame (df) with a column of dates (DATUM).
i then try to make a new column with iso weeks on the dates.
But as a beginner in python, I have run into a problem.
When I try to use:
df ['iso_week_num'] = df ["DATUM"]. isocalendar () [1]
I get the following error message:
AttributeError: 'Series' object has no attribute 'isocalendar'
What am I doing wrong?

Notice that isocalendar must be applied to a timestamp (check documentation) - it is not an attribute as the error raised. Said that, this should work for your case:
# Generating some data (you may use rand too):
dates = ['2021-01-03', '2021-01-04', '2021-01-05', '2021-01-06', '2021-01-07']
dataframe = pd.DataFrame(data = dates, columns = ['date'])
dataframe['date'] = pd.to_datetime(dataframe['date'])
# Applying isocalendar for each element in the series:
dataframe['year'] = dataframe['date'].apply(lambda x: x.isocalendar()[0])

You should use pandas.Series.dt() to access object for datetimelike properties of the Series values.
df['week'] = df['date'].dt.isocalendar().week

Related

Remove date from datatime index in pandas

I want to remove the date from datetime function in pandas and the following code works just fine.
df= pd.read_csv('data.csv')
df['Value']= df.Value.astype(float)
df['Time'] = pd.to_datetime(df['Time']).dt.time
df.set_index('Time',inplace=True)
But after that when I try to select rows based on the time using .loc function it gives me the following error.
df_to_plot = df.loc['09:43:00':'13:54:00']
TypeError: '<' not supported between instances of 'datetime.time' and 'str'
But the same code works fine without .dt.time as follows:
df= pd.read_csv('data.csv')
df['Value']= df.Value.astype(float)
df['Time'] = pd.to_datetime(df['Time'])
df.set_index('Time',inplace=True)
df_to_plot = df.loc['2022-07-28 09:43':'2022-07-28 13:54']
How can I remove date and still select rows based on time?
Thank you.
The TypeError arrises because df['Time'] = pd.to_datetime(df['Time']).dt.time turns df['Time'] into a datetime.time object, whereas in your loc statement, '09:43:00':'13:54:00' is a string.
Try this:
df['Time'] = pd.to_datetime(df['Time']).dt.time.astype(str)
try using df.index = df.index.time

How do I extract a DateTimeIndex for use in a new column?

I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.
Link to code creating the list of DateTimeIndexes from the filenames:
How do I turn datefinder output into a list?
Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:
for i in range(0, len(df)):
df[i]['Date'] = (event_dates_dto[i] for frames in df)
The involved objects:
type(event_dates_dto)
<class 'list'>
type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]
The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/
I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime
Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.
force evaluation with a one line loop like dates = [_ for _ in matches]
convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
convert the column to datetime using pd.to_datetime()
. use the .dt.date object of the datetime column to convert to Y-m-d
Here's a sample
import datefinder
import pandas as pd
data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df
which gives
value date
dt_index
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill 2016-05-05
Edit: Edited to account for forced evaluation
A minor fix got it working, I was just trying to carry out too much at once and overthinking it.
#create empty list and append each date
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
event_dates_transfer.append(event_dates_dto[i][0])
#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
new_date = event_dates_transfer[i]
df[i]['Date'] = new_date
df[i].set_index('Date', inplace=True)

AttributeError: 'Series' object has no attribute 'days'

I have a column 'delta' in a dataframe dtype: timedelta64[ns], calculated by subcontracting one date from another. I am trying to return the number of days as a float by using this code:
from datetime import datetime
from datetime import date
df['days'] = float(df['delta'].days)
but I receive this error:
AttributeError: 'Series' object has no attribute 'days'
Any ideas why?
DataFrame column is a Series, and for Series you need dt.accessor to calculate days (if you are using a newer Pandas version). You can see docs here
So, you need to change:
df['days'] = float(df['delta'].days)
To
df['days'] = float(df['delta'].dt.days)
While subtracting the dates you should use the following code.
df = pd.DataFrame([ pd.Timestamp('20010101'), pd.Timestamp('20040605') ])
(df.loc[0]-df.loc[1]).astype('timedelta64[D]')
So basically use .astype('timedelta64[D]') on the subtracted column.

Pandas Dataframe asFreq changing datatype of index

I'm having an issue when using asfreq to resample a dataframe. My dataframe, df, has an index of type Datetime.Date(). After using df.asfreq('d','pad'), my dataframe index has been changed to type pandas.tslib.Timestamp. I've tried the following to change it back but I'm having no luck...
df = df.set_index(df.index.to_datetime())
df.index = df.index.to_datetime()
df.index = pd.to_datetime(df.index)
Any thoughts?
Thanks!
use pd.to_datetime
df.index = pd.to_datetime(df.index)
This is the canonical approach to creating datetime indices. If you want your index indices to all be of type datetime.datetime then you can do this following.
df.index = pd.Index([i.to_datetime() for i in df.index], name=df.index.name, dtype=object)
I just don't know why you'd want to.
Why is this a problem? If you really need a datetime.date you can try df.index = df.index.map(lambda x: x.date() since pandas.TimeStamp subclasses datetime.datetime

'Index' object has no attribute 'tz_localize'

I'm trying to convert all instances of 'GMT' time in a time/date column ('Created_At') in a csv file so that it is all formatted in 'EST'.
Please see below:
import pandas as pd
from pandas.tseries.resample import TimeGrouper
from pandas.tseries.offsets import DateOffset
from pandas.tseries.index import DatetimeIndex
cambridge = pd.read_csv('\Users\cgp\Desktop\Tweets.csv')
cambridge['Created_At'] = pd.to_datetime(pd.Series(cambridge['Created_At']))
cambridge.set_index('Created_At', drop=False, inplace=True)
cambridge.index = cambridge.index.tz_localize('GMT').tz_convert('EST')
cambridge.index = cambridge.index - DateOffset(hours = 12)
The error I'm getting is:
cambridge.index = cambridge.index.tz_localize('GMT').tz_convert('EST')
AttributeError: 'Index' object has no attribute 'tz_localize'
I've tried various different things but am stumped as to why the Index object won't recognized the tz_attribute. Thank you so much for your help!
Replace
cambridge.set_index('Created_At', drop=False, inplace=True)
with
cambridge.set_index(pd.DatetimeIndex(cambridge['Created_At']), drop=False, inplace=True)
Hmm. Like the other tz_localize current problem, this works fine for me. Does this work for you? I have simplified some of the calls a bit from your example:
df2 = pd.DataFrame(randn(3, 3), columns=['A', 'B', 'C'])
# randn(3,3) returns nine random numbers in a 3x3 array.
# the columns argument to DataFrame names the 3 columns.
# no datetimes here! (look at df2 to check)
df2['A'] = pd.to_datetime(df2['A'])
# convert the random numbers to datetimes -- look at df2 again
# if A had values to_datetime couldn't handle, we'd clean up A first
df2.set_index('A',drop=False, inplace=True)
# and use that column as an index for the whole df2;
df2.index = df2.index.tz_localize('GMT').tz_convert('US/Eastern')
# make it timezone-conscious in GMT and convert that to Eastern
df2.index.tzinfo
<DstTzInfo 'US/Eastern' LMT-1 day, 19:04:00 STD>

Categories