This question already has answers here:
In Pandas how do I convert a string of date strings to datetime objects and put them in a DataFrame?
(3 answers)
Closed 8 years ago.
I'm trying to convert a string to a date and I understand how to use the to_datetime that comes with pandas but I'd like to be able to do this without inserting a time?
I'm sure this is very simple but I'm a little new to this.
You don't need the time component, if you use the datetime.strptime or to_datetime the conversion is the same:
In [10]:
df = pd.DataFrame({'date':['2012/04/06']})
df
Out[10]:
date
0 2012/04/06
In [11]:
import datetime as dt
df['date'].apply(lambda x: dt.datetime.strptime(x, '%Y/%m/%d'))
Out[11]:
0 2012-04-06
Name: date, dtype: datetime64[ns]
In [13]:
pd.to_datetime(df['date'])
Out[13]:
0 2012-04-06
Name: date, dtype: datetime64[ns]
Related
This question already has answers here:
Converting a datetime column to a string column
(4 answers)
Closed 3 days ago.
I have a dataframe which contains a column called period that has datetime values in it in the following format:
2020-03-01T00:00:00.000000000
I want to convert the datetime to strings with the format - 03/01/2020 (month, day, year)
How would I do this?
import pandas as pd
df = pd.DataFrame({'period': ['2020-03-01T00:00:00.000000000', '2020-04-01T00:00:00.000000000']})
df['period'] = pd.to_datetime(df['period'])
df['period'] = df['period'].dt.strftime('%m/%d/%Y')
print(df)
Output
period
0 03/01/2020
1 04/01/2020
This question already has answers here:
python pandas extract year from datetime: df['year'] = df['date'].year is not working
(5 answers)
Closed 3 months ago.
So I wrote some code to turn a list of strings into date times:
s = pd.Series(["14 Nov 2020", "14/11/2020", "2020/11/14",
"Hello World", "Nov 14th, 2020"])
s_dates = pd.to_datetime(s, errors='coerce', exact=False)
print(s_dates)
It produced the following output:
0 2020-11-14
1 2020-11-14
2 2020-11-14
3 NaT
4 2020-11-14
dtype: datetime64[ns]
How would I obtain just the year from this?
Since your seriess_dates has dtype datetime64[ns], you can directly use
Series.dt.year like:
print(s_dates.dt.year)
This will return a series containing only the year (as dtype int64).
Check the documentation for more useful datetime transformations.
Assuming your years would always be 4 digits, we can try using str.extract here:
s_dates["year"] = s_dates["dates_extracted"].str.extract(r'(\d{4})')
I'm trying to covert the next date column (str) to datetime64 and say that format doesn't match, can anyone help me pleas :)
Column:
df["Date"]
0 15/7/21
...
2541 13/9/21
dtype: object
What I try:
pd.to_datetime(df["Date"], format = "%d/%m/%Y")
ValueError: time data '15/7/21' does not match format '%d/%m/%Y' (match)
I also try:
pd.to_datetime(df["Date"].astype("datetime64"), format='%d/%m/%Y')
And it convert it as datetime but there is some date the day is in the month.
Anyone know what to do ?
%Y expects a 4-digit year. Use %y for a 2-digit year (See the docs):
>>> import pandas as pd
>>> df = pd.DataFrame({'Date':['15/7/21','13/9/21']})
>>> df['Date']
0 15/7/21
1 13/9/21
Name: Date, dtype: object
>>> pd.to_datetime(df['Date'].astype('datetime64'),format='%d/%m/%y')
0 2021-07-15
1 2021-09-13
Name: Date, dtype: datetime64[ns]
Note that pandas is pretty good at guessing the format:
>>> pd.to_datetime(df['Date'])
0 2021-07-15
1 2021-09-13
I am using pd.to_datetime to convert strings into datetime;
df = pd.DataFrame(data={'id':['DD-83']})
pd.to_datetime(df['id'].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
%d%m defines zero-padded day and month, but the code still converts the above string into
0 1900-03-08
Name: id, dtype: datetime64[ns]
I am wondering how to avoid it being converted into datetime (e.g. convert to NaT in this case), if the month and day in a string are not 0-padded. So
DD0306
DD0706
DD-83
will convert to
1900-06-03
1900-06-07
NaT
You need to look for - and only pass strings without -.
Setup:
df = pd.DataFrame(data={'id':['DD-83', 'DD0706', 'DD0306']})
Code:
df['date'] = pd.to_datetime(df['id'].loc[~df['id'].str.contains('-')].str.replace(r'\D+', ''), errors='coerce', format='%d%m')
Output:
id date
0 DD-83 NaT
1 DD0706 1900-06-07
2 DD0306 1900-06-03
I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.
import pandas as pd
import datetime
year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]
>>> year_month
month year date
0 1 2001 2001-01-01
1 2 2002 2002-02-01
2 3 2003 2003-03-01
I think the unique function is doing something to the timestamps that is changing them somehow:
first_date = year_month['date'].unique()[0]
>>> first_date == year_month['date'][0]
False
In fact:
>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
'2002-01-31T16:00:00.000000000-0800',
'2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')
My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.
EDIT
I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.
You have to convert to datetime64 to compare:
In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:
True
This is because unique has converted the dtype to datetime64:
In [6]:
first_date = year_month['date'].unique()[0]
first_date
Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')
I think is because unique returns a np array and there is no dtype that numpy understands TimeStamp currently: Converting between datetime, Timestamp and datetime64