pandas to_datetime convert datetime string to 0 - python

I have a column in a df which contains datetime strings,
inv_date
24/01/2008
15/06/2007 14:55:22
08/06/2007 18:26:12
15/08/2007 14:53:25
15/02/2008
07/03/2007
13/08/2007
I used pd.to_datetime with format %d%m%Y for converting the strings into datetime values;
pd.to_datetime(df.inv_date, errors='coerce', format='%d%m%Y')
I got
inv_date
24/01/2008
0
0
0
15/02/2008
07/03/2007
13/08/2007
the format is inferred from inv_date as the most common datetime format; I am wondering how to not convert 15/06/2007 14:55:22, 08/06/2007 18:26:12, 15/08/2007 14:53:25 to 0s, but 15/06/2007, 08/06/2007, 15/08/2007.

Use the regular pd.to_datetime call then use .dt.date:
>>> pd.to_datetime(df.inv_date).dt.date
0 2008-01-24
1 2007-06-15
2 2007-08-06
3 2007-08-15
4 2008-02-15
5 2007-07-03
6 2007-08-13
Name: inv_date, dtype: object
>>>
Or as #ChrisA mentioned, you can also use, only thing is the pandas format is good already, so skipped that part:
>>> pd.to_datetime(df.inv_date.str[:10], errors='coerce')
0 2008-01-24
1 2007-06-15
2 2007-08-06
3 2007-08-15
4 2008-02-15
5 2007-07-03
6 2007-08-13
Name: inv_date, dtype: object
>>>

You can also try this:
df = pd.read_csv('myfile.csv', parse_dates=['inv_date'], dayfirst=True)
df['inv_date'].dt.strftime('%d/%m/%Y')
0 24/01/2008
1 15/06/2007
2 08/06/2007
3 15/08/2007
4 15/02/2008
5 07/03/2007
6 13/08/2007
Hope this will help too.

Related

Pandas Converting date string (only month and year) to datetime

I am trying to convert a datetime object to datetime. In the original dataframe the data type is a string and the dataset has shape = (28000000, 26). Importantly, the format of the date is MMYYYY only. Here's a data sample:
DATE
Out[3] 0 081972
1 051967
2 101964
3 041975
4 071976
I tried:
df['DATE'].apply(pd.to_datetime(format='%m%Y'))
and
pd.to_datetime(df['DATE'],format='%m%Y')
I got Runtime Error both times
Then
df['DATE'].apply(pd.to_datetime)
it worked for the other not shown columns(with DDMMYYYY format), but generated future dates with df['DATE'] because it reads the dates as MMDDYY instead of MMYYYY.
DATE
0 1972-08-19
1 2067-05-19
2 2064-10-19
3 1975-04-19
4 1976-07-19
Expect output:
DATE
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 1976-07
If this question is a duplicate please direct me to the original one, I wasn't able to find any suitable answer.
Thank you all in advance for your help
First if error is raised obviously some datetimes not match, you can test it by errors='coerce' parameter and Series.isna, because for not matched values are returned missing values:
print (df)
DATE
0 81972
1 51967
2 101964
3 41975
4 171976 <-changed data
print (pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce'))
0 1972-08-01
1 1967-05-01
2 1964-10-01
3 1975-04-01
4 NaT
Name: DATE, dtype: datetime64[ns]
print (df[pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').isna()])
DATE
4 171976
Solution with output from changed data with converting to datetimes and the to months periods by Series.dt.to_period:
df['DATE'] = pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').dt.to_period('m')
print (df)
DATE
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 NaT
Solution with original data:
df['DATE'] = pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').dt.to_period('m')
print (df)
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 1976-07
I would have done:
df['date_formatted'] = pd.to_datetime(
dict(
year=df['DATE'].str[2:],
month=df['DATE'].str[:2],
day=1
)
)
Maybe this helps. Works for your sample data.

parse multiple date format pandas

I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None.
So, how can I do it? Thank you!
we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.
You could use a regex to pull out the year and month, and convert to datetime :
df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])
pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"
df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])
print(df)
Dates
0 2001-12-01
1 2002-09-01
2 2001-02-01
3 2001-05-01
4 2005-10-01
5 2007-08-01
Note that pandas automatically converts the day to 1, since only year and month was supplied.

calculate date difference between today's date and pandas date series

Want to calculate the difference of days between pandas date series -
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
and today's date.
I tried but could not come up with logical solution.
Please help me with the code. Actually I am new to python and there are lot of syntactical errors happening while applying any function.
You could do something like
# generate time data
data = pd.to_datetime(pd.Series(["2018-09-1", "2019-01-25", "2018-10-10"]))
pd.to_datetime("now") > data
returns:
0 False
1 True
2 False
you could then use that to select the data
data[pd.to_datetime("now") > data]
Hope it helps.
Edit: I misread it but you can easily alter this example to calculate the difference:
data - pd.to_datetime("now")
returns:
0 -122 days +13:10:37.489823
1 24 days 13:10:37.489823
2 -83 days +13:10:37.489823
dtype: timedelta64[ns]
You can try as Follows:
>>> from datetime import datetime
>>> df
col1
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
Make Sure to convert the column names to_datetime:
>>> df['col1'] = pd.to_datetime(df['col1'], infer_datetime_format=True)
set the current datetime in order to Further get the diffrence:
>>> curr_time = pd.to_datetime("now")
Now get the Difference as follows:
>>> df['col1'] - curr_time
0 -2145 days +07:48:48.736939
1 -2163 days +07:48:48.736939
2 -2140 days +07:48:48.736939
3 -2139 days +07:48:48.736939
4 -2132 days +07:48:48.736939
5 -2119 days +07:48:48.736939
6 -2115 days +07:48:48.736939
7 -2112 days +07:48:48.736939
Name: col1, dtype: timedelta64[ns]
With numpy you can solve it like difference-two-dates-days-weeks-months-years-pandas-python-2
. bottom line
df['diff_days'] = df['First dates column'] - df['Second Date column']
# for days use 'D' for weeks use 'W', for month use 'M' and for years use 'Y'
df['diff_days']=df['diff_days']/np.timedelta64(1,'D')
print(df)
if you want days as int and not as float use
df['diff_days']=df['diff_days']//np.timedelta64(1,'D')
From the pandas docs under Converting To Timestamps you will find:
"Converting to Timestamps To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function"
I haven't used pandas before but this suggests your pandas date series (a list-like object) is iterable and each element of this series is an instance of a class which has a to_datetime function.
Assuming my assumptions are correct, the following function would take such a list and return a list of timedeltas' (a datetime object representing the difference between two date time objects).
from datetime import datetime
def convert(pandas_series):
# get the current date
now = datetime.now()
# Use a list comprehension and the pandas to_datetime method to calculate timedeltas.
return [now - pandas_element.to_datetime() for pandas_series]
# assuming 'some_pandas_series' is a list-like pandas series object
list_of_timedeltas = convert(some_pandas_series)

Error converting data type float to datetime format

I would like to convert the data type float below to datetime format:
df
Date
0 NaN
1 NaN
2 201708.0
4 201709.0
5 201700.0
6 201600.0
Name: Cred_Act_LstPostDt_U324123, dtype: float64
pd.to_datetime(df['Date'],format='%Y%m.0')
ValueError: time data 201700.0 does not match format '%Y%m.0' (match)
How could I transform these rows without month information as yyyy01 as default?
You can use pd.Series.str.replace to clean up your month data:
s = [x.replace('00.0', '01.0') for x in df['Date'].astype(str)]
df['Date'] = pd.to_datetime(s, format='%Y%m.0', errors='coerce')
print(df)
Date
0 NaT
1 NaT
2 2017-08-01
4 2017-09-01
5 2017-01-01
6 2016-01-01
Create a string that contains the float using .asType(str), then split the string at the fourth char and using cat insert a hyphen. Then you can use format='%Y%m.
However this may still fail if you try to use incorrect month numbering, such as month 00
string = df['Date'].astype(str)
s = pd.Series([string[:4], '-',string[4:6])
date = s.str.cat(sep=',')
pd.to_datetime(date.astype(str),format='%Y%m')

How can I create a datetime column without 'date' part?

I have a dataframe and there's a column named 'Time' in it like the below(HH:MM:SS:fffff).
>>> df['Time']
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
...
Name: Time, Length: 18924, dtype: object
I want to change its type as datetime, in order to make it easier to calculate. Is it possible to change its type, using pandas.to_datetime, as datetime without date?
You can convert it to timedelta64[ns] dtype:
Source DF:
In [164]: df
Out[164]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
3 09:42:35:15138
4 09:42:35:95491
5 09:42:43:55414
6 09:42:45:35866
7 09:42:46:74638
8 09:42:47:35582
9 09:42:47:74774
10 09:42:48:94582
In [165]: df.dtypes
Out[165]:
Time object # <-------- NOTE!
dtype: object
Converted:
In [166]: df.Time = pd.to_timedelta(df.Time.str.replace(r'\:(\d+)$', r'.\1'),
errors='coerce')
In [167]: df
Out[167]:
Time
0 09:42:29.752840
1 09:42:29.955840
2 09:42:31.150360
3 09:42:35.151380
4 09:42:35.954910
5 09:42:43.554140
6 09:42:45.358660
7 09:42:46.746380
8 09:42:47.355820
9 09:42:47.747740
10 09:42:48.945820
In [168]: df.dtypes
Out[168]:
Time timedelta64[ns] # <-------- NOTE!
dtype: object
Please refer python to_datetime documentation.
import pandas as pd
df = pd.DataFrame({'Time': ['09:42:29:75284','09:42:29:95584','09:42:31:15036']})
df
Out[]:
Time
0 09:42:29:75284
1 09:42:29:95584
2 09:42:31:15036
You can convert this into datetime format by specifying format as follows:
pd.to_datetime(df['Time'], format='%H:%M:%S:%f')
Out[]:
0 1900-01-01 09:42:29.752840
1 1900-01-01 09:42:29.955840
2 1900-01-01 09:42:31.150360
Name: Time, dtype: datetime64[ns]
but doing this will also add date 1900-01-01.

Categories