I have a dataframe column with datetime data in 1980-12-11T00:00:00 format.
I need to convert the whole column to DD/MM/YYY string format.
Is there any easy code for this?
Creating a working example:
df = pd.DataFrame({'date':['1980-12-11T00:00:00', '1990-12-11T00:00:00', '2000-12-11T00:00:00']})
print(df)
date
0 1980-12-11T00:00:00
1 1990-12-11T00:00:00
2 2000-12-11T00:00:00
Convert the column to datetime by pd.to_datetime() and invoke strftime()
df['date_new']=pd.to_datetime(df.date).dt.strftime('%d/%m/%Y')
print(df)
date date_new
0 1980-12-11T00:00:00 11/12/1980
1 1990-12-11T00:00:00 11/12/1990
2 2000-12-11T00:00:00 11/12/2000
You can use pd.to_datetime to convert string to datetime data
pd.to_datetime(df['col'])
You can also pass specific format as:
pd.to_datetime(df['col']).dt.strftime('%d/%m/%Y')
When using pandas, try pandas.to_datetime:
import pandas as pd
df = pd.DataFrame({'date': ['1980-12-%sT00:00:00'%i for i in range(10,20)]})
df.date = pd.to_datetime(df.date).dt.strftime("%d/%m/%Y")
print(df)
date
0 10/12/1980
1 11/12/1980
2 12/12/1980
3 13/12/1980
4 14/12/1980
5 15/12/1980
6 16/12/1980
7 17/12/1980
8 18/12/1980
9 19/12/1980
Related
Here is my problem.
I have a dataframe imported from a .xlsx file. It contains one column with dates but the problem is it is not presented as datetime.
For instance, in the first line, it is the date (format: DD//MM/YYYY (type str) and the 24 following lines are the hours (format: xh (h for hour, x is between 0 and 23)). This is repeated for three years.
I would like to transform this column so as the cells will be datetime in format YY-MM-DD HH:MM:SS.
First of all, I created a dataframe df2 containing the hours:
indexNames = df1[df1['Hour'].str.contains('/')].index
df2= df1.drop(indexNames)
I transformed it to get it as datetime format HH:MM
# Conserving the number
new = df2["Hour"].str.split("h", n = 1, expand = True)
new["new_Hour"]= new[0]
# Dropping old Name columns
df2.drop(columns = ["Hour"], inplace = True)
# Transforming in datetime format
df2['new_Hour'] = pd.to_datetime(df2['new_Hour'], format="%H")
df2['new_Hour'] = df2['new_Hour'].astype(str)
nouv = df2['new_Hour'].str.split(' ', n=1, expand = True)
df2["Hour"]= nouv[1]
df2.drop(columns = ["new_Hour"], inplace = True)
Then, I created a second dataframe having the date and added separated columns for corresponding year, month and day:
df3= df1.loc[df1['Hour'].str.contains('/')].copy()
df3['Hour'] = pd.to_datetime(df3['Hour'], format="%d/%m/%Y")
df3['year'] = df3['Hour'].dt.year
df3['month'] = df3['Hour'].dt.month
df3['day'] = df3['Hour'].dt.day
Here comes my problem,
df3 indexes are strating at 0 and taking +25 at each line. It means df3.index[0] = 0, df3.index[1] = 25, df3.index[2] = 50 etc
df2 indexes are starting at 1 and more genarally, indexes of df3 are missing.
I would like to add the corresponding date of df3 to the corresponding hours of df2.
After having reseted indexes of ddf2 and df3, I tried:
df4 = df2.copy()
df4['year'] = 2019
df4= df4.reset_index(drop = True)
for i in range(len(df3)-1):
df4['year'].iloc[df3.index[i]:df3.index[i+1]] = df3['year'][i]
But I get copy problems and probably indexes problems too.
Hope you could help me, thanks.
you might want to start out with a cleaner way to create a datetime column? e.g. like
import pandas as pd
# dummy sample...
df = pd.DataFrame({'Hour': ["10/12/2013", "0", "1", "3",
"11/12/2013", "0", "1", "3"]})
# make a date column, forward-fill the dates
df['Datetime'] = pd.to_datetime(df['Hour'], format="%d/%m/%Y", errors='coerce').fillna(method="ffill")
# now we can add the hour
df['Datetime'] = df['Datetime'] + pd.to_timedelta(pd.to_numeric(df['Hour'], errors='coerce'), unit='h')
# and optionally drop nans in the Datetime column, i.e. where we had dates initially
df = df[df["Datetime"].notna()].reset_index(drop=True)
df
Hour Datetime
0 0 2013-12-10 00:00:00
1 1 2013-12-10 01:00:00
2 3 2013-12-10 03:00:00
3 0 2013-12-11 00:00:00
4 1 2013-12-11 01:00:00
5 3 2013-12-11 03:00:00
I have followed the instructions from this thread, but have run into issues.
Converting month number to datetime in pandas
I think it may have to do with having an additional variable in my dataframe but I am not sure. Here is my dataframe:
0 Month Temp
1 0 2
2 1 4
3 2 3
What I want is:
0 Month Temp
1 1990-01 2
2 1990-02 4
3 1990-03 3
Here is what I have tried:
df= pd.to_datetime('1990-' + df.Month.astype(int).astype(str) + '-1', format = '%Y-%m')
And I get this error:
ValueError: time data 1990-0-1 doesn't match format specified
IIUC, we can manually create your datetime object then format it as your expected output:
m = np.where(df['Month'].eq(0),
df['Month'].add(1), df['Month']
).astype(int).astype(str)
df['date'] = pd.to_datetime(
"1900" + "-" + pd.Series(m), format="%Y-%m"
).dt.strftime("%Y-%m")
print(df)
Month Temp date
0 0 2 1900-01
1 1 4 1900-02
2 2 3 1900-03
Try .dt.strftime() to show how to display the date, because datetime values are by default stored in %Y-%m-%d 00:00:00 format.
import pandas as pd
df= pd.DataFrame({'month':[1,2,3]})
df['date']=pd.to_datetime(df['month'], format="%m").dt.strftime('%Y-%m')
print(df)
You have to explicitly tell pandas to add 1 to the months as they are from range 0-11 not 1-12 in your case.
df=pd.DataFrame({'month':[11,1,2,3,0]})
df['date']=pd.to_datetime(df['month']+1, format='%m').dt.strftime('1990-%m')
Here is my solution for you
import pandas as pd
Data = {
'Month' : [1,2,3],
'Temp' : [2,4,3]
}
data = pd.DataFrame(Data)
data['Month']= pd.to_datetime('1990-' + data.Month.astype(int).astype(str) + '-1', format = '%Y-%m').dt.to_period('M')
Month Temp
0 1990-01 2
1 1990-02 4
2 1990-03 3
If you want Month[0] means 1 then you can conditionally add this one
I have a data frame with date columns as 20,190,927 which means: 2019/09/27.
I need to change the format to YYYY/MM/DD or something similar.
I thought of doing it manually like:
x = df_all['CREATION_DATE'].str[:2] + df_all['CREATION_DATE'].str[3:5] + "-" + \
df_all['CREATION_DATE'].str[5] + df_all['CREATION_DATE'].str[7] + "-" + df_all['CREATION_DATE'].str[8:]
print(x)
What's a more creative way of doing this? Could it be done with datetime module?
I believe this is what you want. First replace the , with nothing, so you get a yyyymmdd format, and then change it to datetime with pd.to_datetime by passing the correct format. One liner:
df['dates'] = pd.to_datetime(df['dates'].str.replace(',',''),format='%Y%m%d')
Full explanation:
import pandas as pd
a = {'dates':['20,190,927','20,191,114'],'values':[1,2]}
df = pd.DataFrame(a)
print(df)
Output, here's how the original dataframe looks like:
dates values
0 20,190,927 1
1 20,191,114 2
df['dates'] = df['dates'].str.replace(',','')
df['dates'] = pd.to_datetime(df['dates'],format='%Y%m%d')
print(df)
print(df.info())
Output of the newly formatted dataframe:
dates values
0 2019-09-27 1
1 2019-11-14 2
Printing .info() to ensure we have the correct format:
dates 2 non-null datetime64[ns]
values 2 non-null int64
Hope this helps,
date=['20,190,927','20,190,928','20,190,929']
df3=pd.DataFrame(date,columns=['Date'])
df3['Date']=df3['Date'].replace('\,','',regex=True)
df3['Date']=pd.to_datetime(df3['Date'])
I have a pandas dataframe, in which a column is a string formatted as
yyyymmdd
which should be a date. Is there an easy way to convert it to a recognizable form of date?
And then what python libraries should I use to handle them?
Let's say, for example, that I would like to consider all the events (rows) whose date field is a working day (so mon-fri). What is the smoothest way to handle such a task?
Ok so you want to select Mon-Friday. Do that by converting your column to datetime and check if the dt.dayofweek is lower than 6 (Mon-Friday --> 0-4)
m = pd.to_datetime(df['date']).dt.dayofweek < 5
df2 = df[m]
Full example:
import pandas as pd
df = pd.DataFrame({
'date': [
'20180101',
'20180102',
'20180103',
'20180104',
'20180105',
'20180106',
'20180107'
],
'value': range(7)
})
m = pd.to_datetime(df['date']).dt.dayofweek < 5
df2 = df[m]
print(df2)
Returns:
date value
0 20180101 0
1 20180102 1
2 20180103 2
3 20180104 3
4 20180105 4
When I am reading a time data from an xlsx file into pandas, it reads as a decimal value.
Example: 9:23:27 AM is read as .391284722
I can fix it by converting it into time using format cell and select time. But I would prefer to use pandas all the way through and not Excel.
When I call the value and convert it into a date time object
df.TIME=pd.to_datetime(df.TIME)
It changes to this date 1970-01-01
Desired time is 9:23:27 AM
Any help is greatly appreciated.
Thank you
Demo:
read that column as string:
df = pd.read_excel(filename, dtype={'col_name':str})
In [51]: df
Out[51]:
time
0 9:23:27 AM
1 12:59:59 AM
In [52]: df['time2'] = pd.to_timedelta(df['time'])
In [53]: df
Out[53]:
time time2
0 9:23:27 AM 09:23:27
1 12:59:59 AM 12:59:59
In [54]: df.dtypes
Out[54]:
time object
time2 timedelta64[ns]
dtype: object
UPDATE: in order to convert a float number (# of seconds) read from Excel
try the following:
Source DF:
In [85]: df
Out[85]:
time
0 0.391285
1 0.391285
2 0.391285
Solution:
In [94]: df['time2'] = pd.to_timedelta((df['time'] * 86400).round(), unit='s')
In [95]: df
Out[95]:
time time2
0 0.391285 09:23:27
1 0.391285 09:23:27
2 0.391285 09:23:27
In [96]: df.dtypes
Out[96]:
time float64
time2 timedelta64[ns]
dtype: object
The question could use some clarifying for an end-purpose for the time-column. For general purposes though, try using the format keyword in to_datetime.
df.TIME=pd.to_datetime(df.TIME, format='%I:%M%S %p')
See this website for formatting: http://strftime.org/