I have a dataframe column in the format of 20180531.
I need to split this properly i.e. I can get 2018/05/31.
This is a dataframe column that I have and I need to deal with it in a datetime format.
Currently this column is identified as int64 type
I'm not sure how efficient it'll be but if you convert it to a string, and the use pd.to_datetime with a .format=..., eg:
df['actual_datetime'] = pd.to_datetime(df['your_column'].astype(str), format='%Y%m%d')
As Emma points out - the astype(str) is redundant here and just:
df['actual_datetime'] = pd.to_datetime(df['your_column'], format='%Y%m%d')
will work fine.
Assuming the integer dates would always be fixed width at 8 digits, you may try:
df['dt'] = df['dt_int'].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1-\2-\3')
Related
20160116
Suppose this is the data with datatype integer in a column and now I want to convert it like 2016/01/16 or 2016-01-16 and datatype as date. My column name is system and dataframe is df. How can I do that?
I tried using many date format function but It was not good enough to achieve the answer.
convert using to_datetime, provide the format
then convert to the format of your desire
pd.to_datetime(df['dte'], format='%Y%m%d').dt.strftime('%Y/%m/%d')
0 2016/01/06
Name: dte, dtype: object
Using str.replace we can try:
df["date"] = df["system"].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1/\2/\3', regex=True)
Good Afternoon,
I have a huge dataset where time informations are stored as a float64 (or integer) in one column of the dataframe in format 'ddmmyyyy' (ex. 20 January 2020 would be the float 20012020.0). I need to convert it into a datetime like 'dd-mm-yyyy'. I saw the function to_datetime, but i can't really manage to obtain what i want. Does someone know how to do it?
Massimo
You could try converting to string and after that, to date format, you want like this:
# The first step is to change the type of the column,
# in order to get rid of the .0 we will change first to int and then to
# string
df["date"] = df["date"].astype(int)
df["date"] = df["date"].astype(str)
for i, row in df.iterrows():
# In case that the date format is similar to 20012020
x = str(df["date"].iloc[i])
if len(x) == 8:
df.at[i,'date'] = "{}-{}-{}".format(x[:2], x[2:4], x[4:])
# In case that the format is similar to 1012020
else:
df.at[i,'date'] = "0{}-{}-{}".format(x[0], x[1:3], x[3:])
Edit:
As you said this solution only works if the month always comes in 2
digits.
Added missing variable in the loop
Added change column types before entering the loop.
Let me know if this helps!
I need to print a string on the first multi-index in a date format.
Essentially, I need to delete all data on the first date. But finding out the cause of this error is also very important to me. Thank you very much in advance!
As commented dt.date returns datetime.date object, which is different from Pandas' datetime object. Use dt.floor('D') or dt.normalized() instead. For example, this would work:
df['Date'] = df.session_started.dt.normalize()
df['Time'] = df.session_started.dt.hour
df_hour = df.groupby(['Date','Time']).checkbooking.count()
df_hour.loc['2019-01-13']
I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.
Beginner python (and therefore pandas) user. I am trying to import some data into a pandas dataframe. One of the columns is the date, but in the format "YYYYMM". I have attempted to do what most forum responses suggest:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m')
This doesn't work though (ValueError: unconverted data remains: 3). The column actually includes an additional value for each year, with MM=13. The source used this row as an average of the past year. I am guessing to_datetime is having an issue with that.
Could anyone offer a quick solution, either to strip out all of the annual averages (those with the last two digits "13"), or to have to_datetime ignore them?
pass errors='coerce' and then dropna the NaT rows:
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'], format='%Y%m', errors='coerce').dropna()
The duff month values will get converted to NaT values
In[36]:
pd.to_datetime('201613', format='%Y%m', errors='coerce')
Out[36]: NaT
Alternatively you could filter them out before the conversion
df_cons['YYYYMM'] = pd.to_datetime(df_cons.loc[df_cons['YYYYMM'].str[-2:] != '13','YYYYMM'], format='%Y%m', errors='coerce')
although this could lead to alignment issues as the returned Series needs to be the same length so just passing errors='coerce' is a simpler solution
Clean up the dataframe first.
df_cons = df_cons[~df_cons['YYYYMM'].str.endswith('13')]
df_cons['YYYYMM'] = pd.to_datetime(df_cons['YYYYMM'])
May I suggest turning the column into a period index if YYYYMM column is unique in your dataset.
First turn YYYYMM into index, then convert it to monthly period.
df_cons = df_cons.reset_index().set_index('YYYYMM').to_period('M')