Python timestamp in dataframe - convert into data format - python

I have dataframe in following format:
> buyer_id purch_id timestamp
> buyer_2 purch_2 1330767282
> buyer_3 purch_3 1330771685
> buyer_3 purch_4 1330778269
> buyer_4 purch_5 1330780256
> buyer_5 purch_6 1330813517
I want to ask for your advice how to convert timestamp column (in dataframe) into datetime and then extract only the time of the event into the new column??
Thanks!

assuming 'timestamp' is Unix time (seconds since the epoch), you can cast to_datetime provided the right unit ('s') and use the time part:
df['time'] = pd.to_datetime(df['timestamp'], unit='s').dt.time
df
Out[9]:
buyer_id purch_id timestamp time
0 buyer_2 purch_2 1330767282 09:34:42
1 buyer_3 purch_3 1330771685 10:48:05
2 buyer_3 purch_4 1330778269 12:37:49
3 buyer_4 purch_5 1330780256 13:10:56
4 buyer_5 purch_6 1330813517 22:25:17

Related

Offset date based on condition

I'm not getting my answer right for this query- Year 2061 is seemingly improper. Convert every year which are < 70 to 19XX instead of 20XX
My data frame date column - 2061-01-01,2061-01-02 ,2061-01-03...
required answer - 1961-01-01,1961-01-02,1961-01-03...
myanswer-1983-05-06 19:59:05.224192,1983-05-07 19:59:05.224192,1983-05-08 19:59:05.224192.....
my code(dataframe name is data)
for i in pd.DatetimeIndex(data['DATE']).year:
if i<2000:
data['DATE']=data.DATE+pd.offsets.DateOffset(years=100)
Check it out:
from datetime import datetime , timedelta
data['DATE'] = data.apply(lambda row: datetime.strptime(row['DATE'], "%Y-%m-%d") - timedelta(days=100*365+25), axis = 1)
data will result in:
DATE
0 1961-01-01
1 1961-01-02
2 1961-01-03

How to convert python dataframe timestamp to datetime format

I have a dataframe with date information in one column.
The date visually appears in the dataframe in this format: 2019-11-24
but when you print the type it shows up as:
Timestamp('2019-11-24 00:00:00')
I'd like to convert each value in the dataframe to a format like this:
24-Nov
or
7-Nov
for single digit days.
I've tried using various datetime and strptime commands to convert but I am getting errors.
Here's a way to do:
df = pd.DataFrame({'date': ["2014-10-23","2016-09-08"]})
df['date_new'] = pd.to_datetime(df['date'])
df['date_new'] = df['date_new'].dt.strftime("%d-%b")
date date_new
0 2014-10-23 23-Oct
1 2016-09-08 08-Sept

Parse timestamp having hour beyond 23 in python

I am learning python and came across an issue where I am trying to read timestamp from CSV file in below format,
43:32.0
here 43 is at hours position and convert it to DateTime format in Pandas.
I tried code,
df['time'] = df['time'].astype(str).str[:-2]
df['time'] = pd.to_datetime(df['time'], errors='coerce')
But, this is converting all values to NaT
I need the output to be in format - mm/dd/yyyy hh:mm:ss
I'm going to assume that this is a Date for 11-29-17 (today's date)?
I believe you need to add an extra 0: in the beginning of the string. Basic Example:
import pandas as pd
# creating a dataframe of your string
df1 = pd.DataFrame({'A':['43:32.0']})
# adding '0:' to the front
df1['A'] = '0:' + df1['A'].astype(str)
# making new column to show the output
df1['B'] = pd.to_datetime(df1['A'], errors='coerce')
#output
A B
0 0:43:32.0 2017-11-29 00:43:32

How do I convert timestamp to datetime.date in pandas dataframe?

I need to merge 2 pandas dataframes together on dates, but they currently have different date types. 1 is timestamp (imported from excel) and the other is datetime.date.
Any advice?
I've tried pd.to_datetime().date but this only works on a single item(e.g. df.ix[0,0]), it won't let me apply to the entire series (e.g. df['mydates']) or the dataframe.
I got some help from a colleague.
This appears to solve the problem posted above
pd.to_datetime(df['mydates']).apply(lambda x: x.date())
Much simpler than above:
df['mydates'].dt.date
For me this works:
from datetime import datetime
df[ts] = [datetime.fromtimestamp(x) for x in df[ts]]
You have to know if the unit of the Unix timestamp is in seconds or milliseconds. Assume that it is in seconds and assume that you have the following pandas
print(df.head())
And you get:
timestamp XETHZUSD
0 1609459200 730.85
1 1609545600 775.01
2 1609632000 979.86
3 1609718400 1042.52
4 1609804800 1103.41
You can convert the timestamp to datetime as follows:
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
print(df.head())
And we get:
timestamp XETHZUSD
0 2021-01-01 730.85
1 2021-01-02 775.01
2 2021-01-03 979.86
3 2021-01-04 1042.52
4 2021-01-05 1103.41
If the Unix timestamp was in milliseconds, then you should have typed
df['timestamp'] = pd.to_datetime(df['timestamp'], unit='ms')
Another question was marked as dupe pointing to this, but it didn't include this answer, which seems the most straightforward (perhaps this method did not yet exist when this question was posted/answered):
The pandas doc shows a pandas.Timestamp.to_pydatetime method to "Convert a Timestamp object to a native Python datetime object".
Assume time column is in timestamp integer msec format
1 day = 86400000 ms
Here you go:
day_divider = 86400000
df['time'] = df['time'].values.astype(dtype='datetime64[ms]') # for msec format
df['time'] = (df['time']/day_divider).values.astype(dtype='datetime64[D]') # for day format
If you need the datetime.date objects... then get them through with the .date attribute of the Timestamp
pd.to_datetime(df['mydates']).date
I found the following to be the most effective, when I ran into a similar issue. For instance, with the dataframe df with a series of timestmaps in column ts.
df.ts.apply(lambda x: pd.datetime.fromtimestamp(x).date())
This makes the conversion, you can leave out the .date() suffix for datetimes. Then to alter the column on the dataframe. Like so...
df.loc[:, 'ts'] = df.ts.apply(lambda x: pd.datetime.fromtimestamp(x).date())
I was trying to convert a timestamp column to date/time, here is what I came up with:
df['Timestamp'] = df['Timestamp'].apply(lambda timestamp: datetime.fromtimestamp(timestamp))

ValueError when converting String to datetime

I have a dataframe as follows, and I am trying to reduce the dataframe to only contain rows for which the Date is greater than a variable curve_enddate. The df['Date'] is in datetime and hence I'm trying to convert curve_enddate[i][0] which gives a string of the form 2015-06-24 to datetime but am getting the error ValueError: time data '2015-06-24' does not match format '%Y-%b-%d'.
Date Maturity Yield_pct Currency
0 2015-06-24 0.25 na CAD
1 2015-06-25 0.25 0.0948511020 CAD
The line where I get the Error:
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%b-%d')]
Thank You
You are using wrong date format, %b is for the named months (abbreviations like Jan or Feb , etc), use %m for the numbered months.
Code -
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%m-%d')]
You cannot compare a time.struct_time tuple which is what time.strptime returns to a Timestamp so you also need to change that as well as using '%Y-%m-%d' using m which is the month as a decimal number. You can use pd.to_datetime to create the object to compare:
df = df[df['Date'] > pd.to_datetime(curve_enddate[i][0], '%Y-%m-%d')]

Categories