Convert 8 digits integer to standard date format in Pandas - python

For a date column which is like 20190101, 20190102, how could I change it to 2019/01/01, 2019/01/02 or 2019-01-01, 2019-01-02? Thanks for your help.
I have tried with df['date'] = df['date'].dt.strftime('%Y%m%d'), but it doesn't work.

Using
df['date']=pd.to_datetime(df['date'],format='%Y%m%d')

Related

Split the given integer value as date

20160116
Suppose this is the data with datatype integer in a column and now I want to convert it like 2016/01/16 or 2016-01-16 and datatype as date. My column name is system and dataframe is df. How can I do that?
I tried using many date format function but It was not good enough to achieve the answer.
convert using to_datetime, provide the format
then convert to the format of your desire
pd.to_datetime(df['dte'], format='%Y%m%d').dt.strftime('%Y/%m/%d')
0 2016/01/06
Name: dte, dtype: object
Using str.replace we can try:
df["date"] = df["system"].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1/\2/\3', regex=True)

Split date column into YYYY.MM.DD

I have a dataframe column in the format of 20180531.
I need to split this properly i.e. I can get 2018/05/31.
This is a dataframe column that I have and I need to deal with it in a datetime format.
Currently this column is identified as int64 type
I'm not sure how efficient it'll be but if you convert it to a string, and the use pd.to_datetime with a .format=..., eg:
df['actual_datetime'] = pd.to_datetime(df['your_column'].astype(str), format='%Y%m%d')
As Emma points out - the astype(str) is redundant here and just:
df['actual_datetime'] = pd.to_datetime(df['your_column'], format='%Y%m%d')
will work fine.
Assuming the integer dates would always be fixed width at 8 digits, you may try:
df['dt'] = df['dt_int'].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1-\2-\3')

Formatting dates in python datasframe to be in a m/d/Y

I tried doing this based on what I found online
df['CutoffDate'] = pd.to_datetime(pd.Series(df['CutoffDate']), format="%m/%d/%Y")
df.head()
And I get this
What I want is 6/30/2019
You need strftime
pd.to_datetime(df['CutoffDate']).dt.strftime("%m/%d/%Y")

Convert OffsetDateTimeUdt to date or string

I am working on Pyspark.
I have a column end_date, I want to make some work on it. But I can't due to his type: OffsetDateTimeUdt.
It's possible to convert this type in date or string ?
Exemple of value:2021-08-15T00:00:00.000Z
If you have any idea please let me know :)
Thanks in advance
Convert into timestamp by ignoring the milliseconds like below
x = spark.createDataFrame([("2021-08-15T00:00:00.000Z",)], ['date_str'])
x.select(x.date_str, from_unixtime(unix_timestamp(x.date_str,"yyyy-MM-dd'T'HH:mm:ss'.000Z'"))).show(truncate=False)
output:
+------------------------+------------------------------------------------------------------------------------------+
|date_str |from_unixtime(unix_timestamp(date_str, yyyy-MM-dd'T'HH:mm:ss'.000Z'), yyyy-MM-dd HH:mm:ss)|
+------------------------+------------------------------------------------------------------------------------------+
|2021-08-15T00:00:00.000Z|2021-08-15 00:00:00 |
+------------------------+------------------------------------------------------------------------------------------+

Passing chopped down datetimes

I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.

Categories