Pandas Dataframe Time column has float values - python

I am doing a cleaning of my Database. In one of the tables, the time column has values like 0.013391204. I am unable to convert this to time [mm:ss] format. Is there a function to convert this to the required format [mm:ss]
The head for the column
0 20:00
1 0.013391204
2 0.013333333
3 0.012708333
4 0.012280093
Use the below reproducible data:
import pandas as pd
df = pd.DataFrame({"time": ["20:00", "0.013391204", "0.013333333", "0.012708333", "0.012280093"]})
I expect the output to be like the first row of the column values shown above.

What is the correct time interpretation for say the first entry? 0.013391204 is it 48 seconds?
Because, if we use datetime module we can convert float into the time format:
Updating answer to add the new information
import datetime
datetime.timedelta(days = 0.013391204)
str(datetime.timedelta(days = 0.013391204))
Output:'0:19:17.000026'
Hope this helps :))

First convert values by to_numeric with errors='coerce' for replace non floats to missing values and then replace them by original values with 00: for hours, last convert by to_timedelta with unit='d':
df = pd.DataFrame({"time": ["20:00", "0.013391204", "0.013333333",
"0.012708333", "0.012280093"]})
s = pd.to_numeric(df['time'], errors='coerce').fillna(df['time'].radd('00:'))
df['new'] = pd.to_timedelta(s, unit='d')
print (df)
time new
0 20:00 00:20:00
1 0.013391204 00:19:17.000025
2 0.013333333 00:19:11.999971
3 0.012708333 00:18:17.999971
4 0.012280093 00:17:41.000035

Related

Cannot select rows from pandas dataframe after converting date to month using function .dt.to_period('M')

I have a dataframe table like this:
df = pd.DataFrame({"txn_id":{'A','B','C'},"txn_date":{'2019-04-01','2020-06-01','2021-05-01'})
I was trying to find rows where transaction month is 2021-05
so what i did was:
import datetime
df['txn_month'] = df['txn_date'].dt.to_period('M')
df[df['txn_month'] == '2021-05']
However, the result returns nothing, even though in table i could see column "txn_month" has "2021-05"
could you please help? Thanks!
Use dt.to_period:
#convert to datetime if needed
df["txn_date"] = pd.to_datetime(df["txn_date"])
>>> df[df["txn_date"].dt.to_period("m").eq("2021-05")]
txn_id txn_date
2 C 2021-05-01

How to convert unix timestamp in ms to readable timestamp in a pandas array?

I have a pandas array with a column which contains unix timestamp times, but I think it's in milliseconds because each time as 3 extra 0's at the end. For example, the first data point is 1546300800000, when it should be just 1546300800. I need to convert this column to readable times so right now I have:
df = pd.read_csv('data.csv')
df['Time] = pd.to_datetime(df['Time'])
df.to_csv('data.csv', index=False)
Instead of giving me the correct time it gives me a time in 1970. For example 1546300800000 gives me 1970-01-01 00:25:46.301100 when it should be 2019-01-01 00:00:00. It does this for every timestamp in the column, which is over 20K rows
Data;
df=pd.DataFrame({'UNIX':['1349720105','1546300800']})
Conversion
df['UNIX']=pd.to_datetime(df['UNIX'], unit='s')

Drop certain character in Object before converting to Datetime column in Pandas

My dataframe has a column which measures time difference in the format HH:MM:SS.000
The pandas is formed from an excel file, the column which stores time difference is an Object. However some entries have negative time difference, the negative sign doesn't matter to me and needs to be removed from the time as it's not filtering a condition I have:
Note: I only have the negative time difference there because of the issue I'm currently having.
I've tried the following functions but I get errors as some of the time difference data is just 00:00:00 and some is 00:00:02.65 and some are 00:00:02.111
firstly how would I ensure that all data in this column is to 00:00:00.000. And then how would I remove the '-' from some the data.
Here's a sample of the time diff column, I cant transform this column into datetime as some of the entries dont have 3 digits after the decimal. Is there a way to iterate through the column and add a 0 if the length of the value isn't equal to 12 digits.
00:00:02.97
00:00:03:145
00:00:00
00:00:12:56
28 days 03:05:23.439
It looks like you need to clean your input before you can parse to timedelta, e.g. with the following function:
import pandas as pd
def clean_td_string(s):
if s.count(':') > 2:
return '.'.join(s.rsplit(':', 1))
return s
Applied to a df's column, this looks like
df = pd.DataFrame({'Time Diff': ['00:00:02.97', '00:00:03:145', '00:00:00', '00:00:12:56', '28 days 03:05:23.439']})
df['Time Diff'] = pd.to_timedelta(df['Time Diff'].apply(clean_td_string))
# df['Time Diff']
# 0 0 days 00:00:02.970000
# 1 0 days 00:00:03.145000
# 2 0 days 00:00:00
# 3 0 days 00:00:12.560000
# 4 28 days 03:05:23.439000
# Name: Time Diff, dtype: timedelta64[ns]

Parse timestamp having hour beyond 23 in python

I am learning python and came across an issue where I am trying to read timestamp from CSV file in below format,
43:32.0
here 43 is at hours position and convert it to DateTime format in Pandas.
I tried code,
df['time'] = df['time'].astype(str).str[:-2]
df['time'] = pd.to_datetime(df['time'], errors='coerce')
But, this is converting all values to NaT
I need the output to be in format - mm/dd/yyyy hh:mm:ss
I'm going to assume that this is a Date for 11-29-17 (today's date)?
I believe you need to add an extra 0: in the beginning of the string. Basic Example:
import pandas as pd
# creating a dataframe of your string
df1 = pd.DataFrame({'A':['43:32.0']})
# adding '0:' to the front
df1['A'] = '0:' + df1['A'].astype(str)
# making new column to show the output
df1['B'] = pd.to_datetime(df1['A'], errors='coerce')
#output
A B
0 0:43:32.0 2017-11-29 00:43:32

Sort and Filter data from a Panda Dataframe according to date range

My dataframe has two columns: (i) a date column in a string format and (ii) an int value. I would like to convert the date string into a date object and then filter and sort the data according to a date range. Converting one string to a date worked fine with:
date = dateutil.parser.parse(date_string)
date = ("%02d:%02d:%02d" % (date.hour, date.minute, date.second))
How can I iterate on all the values in the dataframe and apply the parsing so I can then use the panda library on the df to filter and sort the data as follows?
df.sort(['etime'])
df[df['etime'].isin([begin_date, end_date])]
Sample of my dataframe data is below:
etime instantaneous_ops_per_sec
3 2016-06-15T15:30:09Z 26
4 2016-06-15T15:30:14Z 26
5 2016-06-15T15:30:19Z 24
6 2016-06-15T15:30:24Z 27
You want to use pd.to_datetime:
df['etime'] = pd.to_datetime(df['etime'], format="%H:%M:%S")
Try this:
df['etime'] = pd.to_datetime(df['etime'], format="%Y%m%d %H:%M:%S")
df[df['etime'].between([begin_date, end_date])]
Caution: Since your code says date and you use time and then sort on time. The results may not be what you are after. You usually want to filter then sort, But the code in OP does the opposite.

Categories