How to convert to just a date - Pandas, Python [duplicate] - python

This question already has answers here:
In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
(3 answers)
Closed 8 years ago.
I'm trying to convert a string to a date and I understand how to use the to_datetime that comes with pandas but I'd like to be able to do this without inserting a time?
I'm sure this is very simple but I'm a little new to this.

You don't need the time component, if you use the datetime.strptime or to_datetime the conversion is the same:
In [10]:
df = pd.DataFrame({'date':['2012/04/06']})
df
Out[10]:
date
0 2012/04/06
In [11]:
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x, '%Y/%m/%d'))
Out[11]:
0 2012-04-06
Name: date, dtype: datetime64[ns]
In [13]:
pd.to_datetime(df['date'])
Out[13]:
0 2012-04-06
Name: date, dtype: datetime64[ns]

Related

How to convert datetime to strings in python [duplicate]

This question already has answers here:
Converting a datetime column to a string column
(4 answers)
Closed 3 days ago.
I have a dataframe which contains a column called period that has datetime values in it in the following format:
2020-03-01T00:00:00.000000000
I want to convert the datetime to strings with the format - 03/01/2020 (month, day, year)
How would I do this?
import pandas as pd
df = pd.DataFrame({'period': ['2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000']})
df['period'] = pd.to_datetime(df['period'])
df['period'] = df['period'].dt.strftime('%m/%d/%Y')
print(df)
Output
period
0 03/01/2020
1 04/01/2020

How to obtain just the year from pandas data frame? [duplicate]

This question already has answers here:
python pandas extract year from datetime: df['year'] = df['date'].year is not working
(5 answers)
Closed 3 months ago.
So I wrote some code to turn a list of strings into date times:
s = pd.Series(["14 Nov 2020", "14/11/2020", "2020/11/14",
"Hello World", "Nov 14th, 2020"])
s_dates = pd.to_datetime(s, errors='coerce', exact=False)
print(s_dates)
It produced the following output:
0 2020-11-14
1 2020-11-14
2 2020-11-14
3 NaT
4 2020-11-14
dtype: datetime64[ns]
How would I obtain just the year from this?
Since your seriess_dates has dtype datetime64[ns], you can directly use
Series.dt.year like:
print(s_dates.dt.year)
This will return a series containing only the year (as dtype int64).
Check the documentation for more useful datetime transformations.
Assuming your years would always be 4 digits, we can try using str.extract here:
s_dates["year"] = s_dates["dates_extracted"].str.extract(r'(\d{4})')

Convert date column (string) to datetime and match the format

I'm trying to covert the next date column (str) to datetime64 and say that format doesn't match, can anyone help me pleas :)
Column:
df["Date"]
0 15/7/21
...
2541 13/9/21
dtype: object
What I try:
pd.to_datetime(df["Date"], format = "%d/%m/%Y")
ValueError: time data '15/7/21' does not match format '%d/%m/%Y' (match)
I also try:
pd.to_datetime(df["Date"].astype("datetime64"), format='%d/%m/%Y')
And it convert it as datetime but there is some date the day is in the month.
Anyone know what to do ?
%Y expects a 4-digit year. Use %y for a 2-digit year (See the docs):
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21','13/9/21']})
>>> df['Date']
0 15/7/21
1 13/9/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
1 2021-09-13
Name: Date, dtype: datetime64[ns]
Note that pandas is pretty good at guessing the format:
>>> pd.to_datetime(df['Date'])
0 2021-07-15
1 2021-09-13

pandas to_datetime converts non-zero padded month and day into datetime

I am using pd.to_datetime to convert strings into datetime;
df = pd.DataFrame(data={'id':['DD-83']})
pd.to_datetime(df['id'].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
%d%m defines zero-padded day and month, but the code still converts the above string into
0 1900-03-08
Name: id, dtype: datetime64[ns]
I am wondering how to avoid it being converted into datetime (e.g. convert to NaT in this case), if the month and day in a string are not 0-padded. So
DD0306
DD0706
DD-83
will convert to
1900-06-03
1900-06-07
NaT
You need to look for - and only pass strings without -.
Setup:
df = pd.DataFrame(data={'id':['DD-83', 'DD0706', 'DD0306']})
Code:
df['date'] = pd.to_datetime(df['id'].loc[~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
Output:
id date
0 DD-83 NaT
1 DD0706 1900-06-07
2 DD0306 1900-06-03

Datetime and Timestamp equality in Python and Pandas

I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.
import pandas as pd
import datetime
year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]
>>> year_month
month year date
0 1 2001 2001-01-01
1 2 2002 2002-02-01
2 3 2003 2003-03-01
I think the unique function is doing something to the timestamps that is changing them somehow:
first_date = year_month['date'].unique()[0]
>>> first_date == year_month['date'][0]
False
In fact:
>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
'2002-01-31T16:00:00.000000000-0800',
'2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')
My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.
EDIT
I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.
You have to convert to datetime64 to compare:
In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:
True
This is because unique has converted the dtype to datetime64:
In [6]:
first_date = year_month['date'].unique()[0]
first_date
Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')
I think is because unique returns a np array and there is no dtype that numpy understands TimeStamp currently: Converting between datetime, Timestamp and datetime64

Categories