How to convert string datetime of a Dataframe into Datetime - python

I am working with currently with a csv file that contains datetimes and timestamps. The dataframe look like this:
print(df[:10])
[0 '2019-10-10 21:59:17.074007' '2015-10-13 00:55:55.544607'
'2017-05-24 06:00:15.959202' '2016-12-07 09:01:04.729686'
'2019-05-29 11:16:44.130063' '2017-01-19 16:06:37.625964'
'2018-04-07 19:42:43.708620' '2016-06-28 03:13:58.266977'
'2015-03-21 00:03:07.704446']
and now I want to convert those strings into datetime and find the earliest date out of it. I don't have much experience in datetime dataframes so I am not sure how to do it. Any suggestions?

You can convert strings to_datetime, then take min:
dates = ['2019-10-10 21:59:17.074007', '2015-10-13 00:55:55.544607',
'2017-05-24 06:00:15.959202', '2016-12-07 09:01:04.729686',
'2019-05-29 11:16:44.130063', '2017-01-19 16:06:37.625964',
'2018-04-07 19:42:43.708620', '2016-06-28 03:13:58.266977',
'2015-03-21 00:03:07.704446']
pd.to_datetime(dates).min()
Output:
Timestamp('2015-03-21 00:03:07.704446')
Update
If you want to do it across all columns of the dataframe:
df.apply(pd.to_datetime).min().min()

Lets call the list you mentioned l, you can iterate on it and parse dates using datetime.strptime, aggregate them in a new list and return the earliest:
from datetime import datetime
parsed_dates = []
for d in l:
parsed_dates.append(datetime.strptime(d, "%Y-%m-%d %H:%M:%S.%f"))
print(min(parsed_dates))

Convert these value to datetime by using to_datetime() method:
df=pd.to_datetime(df,errors='coerce')
Now find earliest date by using min() method:
earliest_date=df.min()
OR you can also find earliest date by using nsmallest() method(This works on Series):
earliest_date=df.nsmallest(1)

Related

Trying to convert '2020-12-28' to only month, for example 'December'

Trying to convert '2020-12-28' to only month, for example 'December'.
I already converted the column to datetime from object and then used the following code:
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
But this code gives me the error
'Length of value does not match length of index'.
However, the column 'ArrivalDate' is not the index and I do not intend to make it either. I also have multiple values with the same dates and I want to aggregate them based on months.
You can use pandas.Series.dt.strftime() to convert datetime to string with format you designate.
df['month'] = df['ArrivalDate'].dt.strftime('%B')

How to change string to date type using dask dataframes in python?

I am parsing JSON data in POST request in my URL and want to extract date and convert that to Date type in Python and all this I am doing in dask. How to convert this?
Previously I have tried using pandas dataframes using:
datetime.strptime(i, '%Y-%m-%d').date().__str__()
if i understand right: the list somedates contains string values of dates in the format: "2020-01-23", and you want to convert is to datetime type.
import datetime as dt
for i in range(len(somedates)):
myformat = "%Y-%m-%d"
somedates[i] = dt.datetime.strptime(somedates[i], myformat)

Convert a date to a different format for an entire new column

I want to convert the date in a column in a dataframe to a different format. Currently, it has this format: '2019-11-20T01:04:18'. I want it to have this format: 20-11-19 1:04.
I think I need to develop a loop and generate a new column for the new date format. So essentially, in the loop, I would refer to the initial column and then generate the variable for the new column in the format I want.
Can someone help me out to complete this task?
The following code works for one occasion:
import datetime
d = datetime.datetime.strptime('2019-11-20T01:04:18', '%Y-%m-%dT%H:%M:%S')
print d.strftime('%d-%m-%y %H:%M')
From a previous answer in this site , this should be able to help you, comments give explanation
You can read your data into pandas from csv or database or create some test data as shown below for testing.
>>> import pandas as pd
>>> df = pd.DataFrame({'column': {0: '26/1/2016', 1: '26/1/2016'}})
>>> # First convert the column to datetime datatype
>>> df['column'] = pd.to_datetime(df.column)
>>> # Then call the datetime object format() method, set the modifiers you want here
>>> df['column'] = df['column'].dt.strftime('%Y-%m-%dT%H:%M:%S')
>>> df
column
0 2016-01-26T00:00:00
1 2016-01-26T00:00:00
NB. Check to ensure that all your columns have similar date strings
You can either achieve it like this:
from datetime import datetime
df['your_column'] = df['your_column'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M'))

Pandas: select all dates with specific month and day

I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1

Pivoting out Datetimes and then calling an operation in Pandas/Python

I've seen several articles about using datetime and dateutil to convert into datetime objects.
However, I can't seem to figure out how to convert a column into a datetime object so I can pivot out that columns and perform operations against it.
I have a dataframe as such:
Col1 Col 2
a 1/1/2013
a 1/12/2013
b 1/5/2013
b 4/3/2013 ....etc
What I want is :
pivott = pivot_table( df, rows ='Col1', values='Col2', and then I want to get the range of dates for each value in Col1)
I am not sure how to correctly approach this. Even after using
df['Col2']= pd.to_datetime(df['Col2'])
I couldn't do operations against the dates since they are strings...
Any advise?
Use datetime.strptime
import pandas as pd
from datetime import datetime
df = pd.read_csv('somedata.csv')
convertdatetime = lambda d: datetime.strptime(d,'%d/%m/%Y')
converted = df['DATE_TIME_IN_STRING'].apply(convertdatetime)
converted[:10] # you should be getting dtype: datetime64[ns]

Categories