Convert Raw Date into Year / Month / Day of Week in Pandas - python

I have a Pandas dataframe with raw dates formatted as such "19990130". I want to convert these into new columns: 'year', 'month', and 'dayofweek'.
I tried using the following:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values
Which does give me an array of datetime objects. However, the next step I tried was using .to_pydatetime() and then .year to try to get the year out, like this:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values.to_pydatetime().year
This works when I test a single value, but with a Pandas dataframe. I get:
'numpy.ndarray' object has no attribute 'to_pydatetime'
What's the easiest way to extract the year, month, and day of week from this data?

Try:
s = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
s.dt.year
# or
# s.dt.month, etc

Related

Convert Month Year to YYYY-MM-DD date format Python Pandas

i want to convert Month and year to YYYY-MM-DD in a dataframe in panda, the date will be the first day of that month
i try using this
pd.to_datetime(df, format='%Y-%m-%d', errors='ignore')
I expected the result to be
Try with format '%b,%Y':
df['date']=pd.to_datetime(df['date'], format='%b,%Y', errors='coerce')
OR
Don't use format at all and let pandas infer it:
df['date']=pd.to_datetime(df['date'], errors='coerce')
For more info regarding format code see docs

Converting dates to datetime64 results in day and month places getting swapped

I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])

Weird Date formats part of column in YY-mm-dd and the rest of columns YY-dd-mm

This is a strange one but I have an original excel with 10/11/2018 and the above problem happens when i convert column to datetime using:
df.Date = pd.to_datetime(df['Date'])
So the date column is 2018-01-11, then the date/months are equal for example 2018-11-11, it swaps the format of previous row and the row is now
''2018-11-12''
''2018-11-13''
ive tried to write a for loop for each entry changing the series but get error cant change series, then i tried writing a loop but get the time error
for date_ in jda.Date:
jda.Date[date_] = jda.Date[date_].strftime('%Y-%m-%d')
KeyError: Timestamp('2019-05-17 00:00:00')
Beow is a pic of where the forat changes
Thank you for your help
Solution if dates are saved like strings:
I think problem is wrong parsed datetimes, because by default are 10/11/2018 parsed to 11.October 2018, so if need parse to 10. November 2018 format add dayfirst=True parameter in to_datetime:
df.Date = pd.to_datetime(df['Date'], dayfirst=True)
Or you can specify format e.g. %d/%m/%Y for DD/MM/YYYY:
df.Date = pd.to_datetime(df['Date'], format='%d/%m/%Y')

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

Pandas to_datetime not formatting as expected

I have a data frame with a column 'Date' with data type datetime64. The values are in YYYY-MM-DD format.
How can I convert it to YYYY-MM format and use it as a datetime64 object itself.
I tried converting my datetime object to a string in YYYY-MM format and then back to datetime object in YYYY-MM format but it didn't work.
Original data = 1988-01-01.
Converting datatime object to string in YY-MM format
df['Date']=df['Date'].dt.strftime('%Y-%m')
This worked as expected, my column value became
1988-01
Converting the string back to datetime object in Y-m format
df['Date']=pd.to_datetime(df['Date'],format= '%Y-%m')
I was expecting the Date column in YYYY-MM format but it became YYYY-MM-DD format.
1988-01-01
Can you please let me know if I am missing something.
Thanks
It is expected behaviour, in datetimes the year, month and day arguments are required.
If want remove days need month period by to_period:
df['Date'] = df['Date'].dt.to_period('M')
df['Date'] = pd.to_datetime(df['Date'],format= '%Y-%m').dt.to_period('M')
Sample:
df = pd.DataFrame({'Date':pd.to_datetime(['1988-01-01','1999-01-15'])})
print (df)
Date
0 1988-01-01
1 1999-01-15
df['Date'] = df['Date'].dt.to_period('M')
print (df)
Date
0 1988-01
1 1999-01

Categories