String dates into unixtime in a pandas dataframe

String dates into unixtime in a pandas dataframe - python

i got dataframe with column like this:
Date
3 mins
2 hours
9-Feb
13-Feb
the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?

One idea is convert columns to datetimes and to timedeltas:
df['dates'] = pd.to_datetime(df['Date']+'-2020', format='%d-%b-%Y', errors='coerce')
times = df['Date'].replace({'(\d+)\s+mins': '00:\\1:00',
'\s+hours': ':00:00'}, regex=True)
df['times'] = pd.to_timedelta(times, errors='coerce')
#remove rows if missing values in dates and times
df = df[df['Date'].notna() | df['times'].notna()]
df['all'] = df['dates'].dropna().astype(np.int64).append(df['times'].dropna().astype(np.int64))
print (df)
Date dates times all
0 3 mins NaT 00:03:00 180000000000
1 2 hours NaT 02:00:00 7200000000000
2 9-Feb 2020-02-09 NaT 1581206400000000000
3 13-Feb 2020-02-13 NaT 1581552000000000000

Related

How to homogenize date type in a pandas dataframe column?

I have a Date column in my dataframe having dates with 2 different types (YYYY-DD-MM 00:00:00 and YYYY-DD-MM) :
Date
0 2023-01-10 00:00:00
1 2024-27-06
2 2022-07-04 00:00:00
3 NaN
4 2020-30-06
(you can use pd.read_clipboard(sep='\s\s+') after copying the previous dataframe to get it in your notebook)
I would like to have only a YYYY-MM-DD type. Consequently, I would like to have :
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaN
4 2020-06-30
How please could I do ?

Use Series.str.replace with to_datetime and format parameter:
df['Date'] = pd.to_datetime(df['Date'].str.replace(' 00:00:00',''), format='%Y-%d-%m')
print (df)
Date
0 2023-10-01
1 2024-06-27
2 2022-04-07
3 NaT
4 2020-06-30
Another idea with match both formats:
d1 = pd.to_datetime(df['Date'], format='%Y-%d-%m', errors='coerce')
d2 = pd.to_datetime(df['Date'], format='%Y-%d-%m 00:00:00', errors='coerce')
df['Date'] = d1.fillna(d2)

Substracting timedelta in pandas

I have a dataframe with two columns (date and days).
df = pd.DataFrame({'date':[2020-01-31, 2020-01-21, 2020-01-11], 'days':[1, 2, 3]})
I want to have a third column (date_2) for which to substract the number of days from the date. Therefore, date_2 would be [2020-01-30, 2020-01-19, 2020-01-8].
I know timedelta(days = i) but I cannot give it the content of df['days'] as i in pandas.

Use to_timedelta with unit=d and subtract
>>pd.to_datetime(df['date'])-pd.to_timedelta(df['days'],unit='d')
0 2020-01-30
1 2020-01-19
2 2020-01-08
dtype: datetime64[ns]

Use to_datetime for datetimes and subtract by Series.sub with timedeltas created by to_timedelta:
df['new'] = pd.to_datetime(df['date']).sub(pd.to_timedelta(df['days'], unit='d'))
print (df)
date days new
0 2020-01-31 1 2020-01-30
1 2020-01-21 2 2020-01-19
2 2020-01-11 3 2020-01-08

Reformat Dataframe column to date only format

I have a dataframe (df) with a column 'Date of birth' column:
Date of birth
0 1957-04-30 00:00:00
1 1966-11-10 00:00:00
2 1966-11-10 00:00:00
3 1962-03-28 00:00:00
4 1958-10-28 00:00:00
5 1958-06-04 00:00:00
How can I reformat the column to a date only format? After I reformat I'm going to work out age from a specific date:
Date of birth
0 1957-04-30
1 1966-11-10
2 1966-11-10
3 1962-03-28
4 1958-10-28
5 1958-06-04
I have tried using
df["Date of birth"] = pd.to_datetime(df['Date of birth'], format='%d%b%Y')
df["Date of birth"] = df["Date of birth"].dt.strftime('%m/%d/%Y')
but with no joy.

After the column becomes a date, use date accessor to access it.
df["Date of birth"] = pd.to_datetime(df['Date of birth']).dt.date

Convert Dataframe column to time format in python

I have a dataframe column which looks like this :
It reads M:S.MS. How can I convert it into a M:S:MS timeformat so I can plot it as a time series graph?
If I plot it as it is, python throws an Invalid literal for float() error.
Note
: This dataframe contains one hour worth of data. Values between
0:0.0 - 59:59.9

df = pd.DataFrame({'date':['00:02.0','00:05:0','00:08.1']})
print (df)
date
0 00:02.0
1 00:05:0
2 00:08.1
It is possible convert to datetime:
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f')
print (df)
date
0 1900-01-01 00:00:02.000
1 1900-01-01 00:00:05.000
2 1900-01-01 00:00:08.100
Or to timedeltas:
df['date'] = pd.to_timedelta(df['date'].radd('00:'))
print (df)
date
0 00:00:02
1 00:00:05
2 00:00:08.100000
EDIT:
For custom date use:
date = '2015-01-04'
td = pd.to_datetime(date) - pd.to_datetime('1900-01-01')
df['date'] = pd.to_datetime(df['date'], format='%M:%S.%f') + td
print (df)
date
0 2015-01-04 00:00:02.000
1 2015-01-04 00:00:05.000
2 2015-01-04 00:00:08.100

Concatenate two dataframe columns as one timestamp

I'm working on a pandas dataframe, one of my column is a date (YYYYMMDD), another one is an hour (HH:MM), I would like to concatenate the two column as one timestamp or datetime64 column, to later use that column as an index (for a time series). Here is the situation :
Do you have any ideas? The classic pandas.to_datetime() seems to work only if the columns contain hours only, day only and year only, ... etc...

Setup
df
Out[1735]:
id date hour other
0 1820 20140423 19:00:00 8
1 4814 20140424 08:20:00 22
Solution
import datetime as dt
#convert date and hour to str, concatenate them and then convert them to datetime format.
df['new_date'] = df[['date','hour']].astype(str).apply(lambda x: dt.datetime.strptime(x.date + x.hour, '%Y%m%d%H:%M:%S'), axis=1)
df
Out[1756]:
id date hour other new_date
0 1820 20140423 19:00:00 8 2014-04-23 19:00:00
1 4814 20140424 08:20:00 22 2014-04-24 08:20:00

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

String dates into unixtime in a pandas dataframe - python

i got dataframe with column like this: Date 3 mins 2 hours 9-Feb 13-Feb the type of the dates is string for every row. What is the easiest way to get that dates into integer unixtime ?

Related

How to homogenize date type in a pandas dataframe column?

Substracting timedelta in pandas

Reformat Dataframe column to date only format

Convert Dataframe column to time format in python

Concatenate two dataframe columns as one timestamp

Categories

Resources