I have a data frame indexed with a date (Python datetime object). How could I find the frequency as the number of months of data in the data frame?
I tried the attribute data_frame.index.freq, but it returns a none value. I also tried asfreq function using data_frame.asfreq('M',how={'start','end'} but it does not return the expected results. Please advise how I can get the expected results.
You want to convert you index of datetimes to a DatetimeIndex, the easiest way is to use to_datetime:
df.index = pd.to_datetime(df.index)
Now you can do timeseries/frame operations, like resample or TimeGrouper.
If your data has a consistent frequency, then this will be df.index.freq, if it doesn't (e.g. if some days are missing) then df.index.freq will be None.
You probably want to be use pandas Timestamp for your index instead of datetime to use 'freq'. See example below
import pandas as pd
dates = pd.date_range('2012-1-1','2012-2-1')
df = pd.DataFrame(index=dates)
print (df.index.freq)
yields,
<Day>
You can easily convert your dataframe like so,
df.index = [pd.Timestamp(d) for d in df.index]
Related
I have a datetime variable date_var=datetime(2020,09,11,0,0,0 ) and i am trying to populate a dataframe column for each row with this value. So i did something like df['Time']=date_var first this show 'Time' field datatype as datetime64 [ns] and not datetime and this populates Time field with value 2020-09-11 instead of 2020-09-11 00:00:00. Am i doing something incorrect ?
Thanks
You've done nothing wrong. The fact that it prints as the date without time is just a convention in Pandas for simpler output. You can use df['Time'].dt.strftime('%F %T') if you want the column printed with the time part as well.
Storing datetimes as the Pandas type (datetime64[ns]) is better than storing them as the Python type, because it is more efficient to manipulate (e.g. to add offsets to all of them).
Try this code
import datetime
import pandas as pd
date_var = datetime.datetime(2020,9,11,0,0,0)
df['Time'] = date_var.strftime('%Y-%m-%d %H:%M:%S')
df['Time'] = pd.to_datetime(df['Time'])
I've extracted the dates from filenames in a set of Excel files into a list of DateTimeIndex objects. I now need to write the extracted date from each to a new date column for the dataframes I've created from each Excel sheet. My code works in that it writes the the new 'Date' column to each dataframe, but I'm unable to convert the objects out of their generator object DateTimeIndex format and into a %Y-%m-%d format.
Link to code creating the list of DateTimeIndexes from the filenames:
How do I turn datefinder output into a list?
Code to write each list entry to a new 'Date' column in each dataframe created from the spreadsheets:
for i in range(0, len(df)):
df[i]['Date'] = (event_dates_dto[i] for frames in df)
The involved objects:
type(event_dates_dto)
<class 'list'>
type(event_dates_dto[0])
<class 'pandas.core.indexes.datetimes.DatetimeIndex'>
event_dates_dto
[DatetimeIndex(['2019-03-29'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-04-13'], dtype='datetime64[ns]', freq=None), DatetimeIndex(['2019-05-11'], dtype='datetime64[ns]', freq=None)]
The dates were extracted using datefinder: http://www.blog.pythonlibrary.org/2016/02/04/python-the-datefinder-package/
I've tried using methods here that seemed like they could make sense but none of them are the right ticket: Keep only date part when using pandas.to_datetime
Again, the simple for function is working correctly, but I'm unsure how to coerce the generator object into the correct format so that it not only writes to the new 'Date' column but also so that it is is in a useful '%Y-%m-%d' format that makes sense within the dataframe. Any help is greatly appreciated.
force evaluation with a one line loop like dates = [_ for _ in matches]
convert the index to a column using the .index (or .reset_index() if you don't need to keep it)
convert the column to datetime using pd.to_datetime()
. use the .dt.date object of the datetime column to convert to Y-m-d
Here's a sample
import datefinder
import pandas as pd
data = '''Your appointment is on July 14th, 2016 15:24. Your bill is due 05/05/2016 16:00'''
matches = datefinder.find_dates(data)
# force evaluation with 1 line loop
dates = [_ for _ in matches] # 'dates = list(matches)' also works
df = pd.DataFrame({'dt_index':dates,'value':['appointment','bill']}).set_index('dt_index')
df['date'] = df.index
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.date
df
which gives
value date
dt_index
2016-07-14 15:24:00 appointment 2016-07-14
2016-05-05 16:00:00 bill 2016-05-05
Edit: Edited to account for forced evaluation
A minor fix got it working, I was just trying to carry out too much at once and overthinking it.
#create empty list and append each date
event_dates_transfer = []
#use .strftime('%Y-%m-%d') method on event_dates_dto here if you wish to return a string instead of a datetimeindex
for i in range(0,len(event_dates_dto)):
event_dates_transfer.append(event_dates_dto[i][0])
#Create a 'Date' column for each dataframe correlating to the filename it was created from and set it as the index
for i in range(0, len(df)):
new_date = event_dates_transfer[i]
df[i]['Date'] = new_date
df[i].set_index('Date', inplace=True)
Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')
This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime is having an issue with that.
Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime ignore them?
pass errors='coerce' and then dropna the NaT rows:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()
The duff month values will get converted to NaT values
In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')
Out[36]: NaT
Alternatively you could filter them out before the conversion
df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')
although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce' is a simpler solution
Clean up the dataframe first.
df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])
May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.
First turn YYYYMM into index, then convert it to monthly period.
df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')
I've got an imported csv file which has multiple columns with dates in the format "5 Jan 2001 10:20". (Note not zero-padded day)
if I do df.dtype then it shows the columns as being a objects rather than a string or a datetime. I need to be able to subtract 2 column values to work out the difference so I'm trying to get them into a state where I can do that.
At the moment if I try the test subtraction at the end I get the error unsupported operand type(s) for -: 'str' and 'str'.
I've tried multiple methods but have run into a problem every way I've tried.
Any help would be appreciated. If I need to give any more information then I will.
As suggested by #MaxU, you can use pd.to_datetime() method to bring the values of the given column to the 'appropriate' format, like this:
df['datetime'] = pd.to_datetime(df.datetime)
You would have to do this on whatever columns you have that you need trasformed to the right dtype.
Alternatively, you can use parse_dates argument of pd.read_csv() method, like this:
df = pd.read_csv(path, parse_dates=[1,2,3])
where columns 1,2,3 are expected to contain data that can be interpreted as dates.
I hope this helps.
convert a column to datetime using this approach
df["Date"] = pd.to_datetime(df["Date"])
If column has empty values then change error level to coerce to ignore errors: Details
df["Date"] = pd.to_datetime(df["Date"], errors='coerce')
After which you should be able to subtract two dates.
example:
import pandas
df = pandas.DataFrame(columns=['to','fr','ans'])
df.to = [pandas.Timestamp('2014-01-24 13:03:12.050000'), pandas.Timestamp('2014-01-27 11:57:18.240000'), pandas.Timestamp('2014-01-23 10:07:47.660000')]
df.fr = [pandas.Timestamp('2014-01-26 23:41:21.870000'), pandas.Timestamp('2014-01-27 15:38:22.540000'), pandas.Timestamp('2014-01-23 18:50:41.420000')]
(df.fr-df.to).astype('timedelta64[h]')
consult this answer for more details:
Calculate Pandas DataFrame Time Difference Between Two Columns in Hours and Minutes
If you want to directly load the column as datetime object while reading from csv, consider this example :
Pandas read csv dateint columns to datetime
I found that the problem was to do with missing values within the column. Using coerce=True so df["Date"] = pd.to_datetime(df["Date"], coerce=True) solves the problem.
I've seen several articles about using datetime and dateutil to convert into datetime objects.
However, I can't seem to figure out how to convert a column into a datetime object so I can pivot out that columns and perform operations against it.
I have a dataframe as such:
Col1 Col 2
a 1/1/2013
a 1/12/2013
b 1/5/2013
b 4/3/2013 ....etc
What I want is :
pivott = pivot_table( df, rows ='Col1', values='Col2', and then I want to get the range of dates for each value in Col1)
I am not sure how to correctly approach this. Even after using
df['Col2']= pd.to_datetime(df['Col2'])
I couldn't do operations against the dates since they are strings...
Any advise?
Use datetime.strptime
import pandas as pd
from datetime import datetime
df = pd.read_csv('somedata.csv')
convertdatetime = lambda d: datetime.strptime(d,'%d/%m/%Y')
converted = df['DATE_TIME_IN_STRING'].apply(convertdatetime)
converted[:10] # you should be getting dtype: datetime64[ns]