Hello,
I am trying to extract date and time column from my excel data. I am getting column as DataFrame with float values, after using pandas.to_datetime I am getting date with different date than actual date from excel. for example, in excel starting date is 01.01.1901 00:00:00 but in python I am getting 1971-01-03 00:00:00.000000 like this.
How can I solve this problem?
I need a final output in total seconds with DataFrame. First cell starting as a 00 sec and very next cell with timestep of seconds (time difference in ever cell is 15min.)
Thank you.
Your input is fractional days, so there's actually no need to convert to datetime if you want the duration in seconds relative to the first entry. Subtract that from the rest of the column and multiply by the number of seconds in a day:
import pandas as pd
df = pd.DataFrame({"Datum/Zeit": [367.0, 367.010417, 367.020833]})
df["totalseconds"] = (df["Datum/Zeit"] - df["Datum/Zeit"].iloc[0]) * 86400
df["totalseconds"]
0 0.0000
1 900.0288
2 1799.9712
Name: totalseconds, dtype: float64
If you have to use datetime, you'll need to convert to timedelta (duration) to do the same, e.g. like
df["datetime"] = pd.to_datetime(df["Datum/Zeit"], unit="d")
# df["datetime"]
# 0 1971-01-03 00:00:00.000000
# 1 1971-01-03 00:15:00.028800
# 2 1971-01-03 00:29:59.971200
# Name: datetime, dtype: datetime64[ns]
# subtraction of datetime from datetime gives timedelta, which has total_seconds:
df["totalseconds"] = (df["datetime"] - df["datetime"].iloc[0]).dt.total_seconds()
# df["totalseconds"]
# 0 0.0000
# 1 900.0288
# 2 1799.9712
# Name: totalseconds, dtype: float64
My instructions are as follows:
Read the date columns in as timestamps, convert them to YYYY/MM/DD
hours:minutes:seconds format, where you set hours minutes and seconds to random
values appropriate to their range
Here is column of the data frame we are suppose to alter to datetime:
Order date
11/12/2016
11/24/2016
6/12/2016
10/12/2016
...
And here is the date time I need
2016/11/12 (random) hours:minutes:seconds
2016/11/24 (random) hours:minutes:seconds
...
My main question is how do I get random hours minutes and seconds. The rest I can figure out with the documentation
You can generate random numbers between 0 and 86399 (number of seconds in a day - 1) and convert to a TimeDelta with pandas.to_timedelta:
import numpy as np
time = pd.to_timedelta(np.random.randint(0, 60*60*24-1, size=len(df)), unit='s')
df['Order date'] = pd.to_datetime(df['Order date']).add(time)
Output:
Order date
0 2016-11-12 02:21:53
1 2016-11-24 13:26:00
2 2016-06-12 15:13:03
3 2016-10-12 14:45:12
You're trying to read the data in '%Y-%m-%d' format but the data is in "%d/%m/%Y" format. See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior to find out how to convert the date to your desired format.
I work with a variety of instruments, and one is particularly troublesome in that the exported data is in XLS or XLSX format with multiple pages, and multiple columns. I only want some pages and some columns, I have achieved reading this into pandas already.
I want to convert time (see below) into a decimal, in hours. This would be from an initial time (in the time stamp data) at the top of the column so timedelta is probably a more correct value, in hours. I am only concerned about this column. How to convert an entire column of data from one format, to another?
date/time (absolute time) timestamped format YYYY-MM-DD TT:MM:SS
I have found quite a few answers but they don't seem to apply to this particular case, mostly focusing on individual cells or manually entered small data sets. My thousands of data files each have as many as 500,000 lines so something more automated is preferred. There is no upper limit to the number of hours.
What might be part of the same question (someone asked me) is this is already in a Pandas dataframe, should it be converted before or after being read in?
This might seem an amateur-ish question, and it is. I've avoided code writing for years, now I have to learn to data-wrangle for my job and it's frustrating so go easy on me.
Going about it the usual way by trying to adapt most of the solutions I found to a column, I get errors
**This is the code which works
import pandas as pd
import matplotlib.pyplot as plt
from pathlib import Path
from datetime import datetime # not used
import time # not used
import numpy as np # Not used
loc1 = r"path\file.xls"
pd.read_excel(loc1)
filename=Path(loc1).stem
str_1=filename
df = pd.concat(pd.read_excel(loc1, sheet_name=[3,4,5,6,7,8,9]), ignore_index=False)
***I NEED A CODE TO CONVERT DATESTAMPS TO HOURS (decimal) most likely a form of timedelta***
df.plot(x='Relative Time(h:min:s.ms)',y='Voltage(V)', color='blue')
plt.xlabel("relative time") # This is a specific value
plt.ylabel("voltage (V)")
plt.title(str_1) # filename is used in each sample as a graph title
plt.show()
Image of relevent information (already described above)
You should provide a minimal reproducible example, to help understand what exactly are the issues you are facing.
Setup
Reading between the lines, here is a setup that hopefully exemplifies the kind of data you have:
vals = pd.Series([
'2019-10-21 17:22:06', # absolute date
'2019-10-21 23:22:06.236', # absolute date, with milliseconds
'2019-10-21 12:00:00.236145', # absolute date, with microseconds
'5:10:10', # timedelta
'40:10:10.123', # timedelta, with milliseconds
'345:10:10.123456', # timedelta, with microseconds
])
Solution
Now, we can use two great tools that Pandas offers to quickly convert string series into Timestamps (pd.to_datetime) and Timedelta (pd.to_timedelta), for absolute date-times and durations, respectively.
In both cases, we use errors='coerce' to convert what is convertible, and leave the rest to NaN.
origin = pd.Timestamp('2019-01-01 00:00:00') # origin for absolute dates
a = pd.to_datetime(vals, format='%Y-%m-%d %H:%M:%S.%f', errors='coerce') - origin
b = pd.to_timedelta(vals, errors='coerce')
tdelta = a.where(~a.isna(), b)
hours = tdelta.dt.total_seconds() / 3600
With the above:
>>> hours
0 7049.368333
1 7055.368399
2 7044.000066
3 5.169444
4 40.169479
5 345.169479
dtype: float64
Explanation
Let's examine some of the pieces above. a handles absolute date-times. Before subtraction of origin to obtain a Timedelta, it is still a Series of Timestamps:
>>> pd.to_datetime(vals, format='%Y-%m-%d %H:%M:%S.%f', errors='coerce')
0 2019-10-21 17:22:06.000000
1 2019-10-21 23:22:06.236000
2 2019-10-21 12:00:00.236145
3 NaT
4 NaT
5 NaT
dtype: datetime64[ns]
b handles values that are already expressed as durations:
>>> b
0 NaT
1 NaT
2 NaT
3 0 days 05:10:10
4 1 days 16:10:10.123000
5 14 days 09:10:10.123456
dtype: timedelta64[ns]
tdelta is the merge of the non-NaN values of a and b:
>>> tdelta
0 293 days 17:22:06
1 293 days 23:22:06.236000
2 293 days 12:00:00.236145
3 0 days 05:10:10
4 1 days 16:10:10.123000
5 14 days 09:10:10.123456
dtype: timedelta64[ns]
Of course, you can change your origin to be any particular date of reference.
Addendum
After clarifying comments, it seems that the main issue is how to adapt the solution above (or any similar existing example) to their specific problem.
Using the names seen in the images of the edited question, I would suggest:
# (...)
# df = pd.concat(pd.read_excel(loc1, sheet_name=[3,4,5,6,7,8,9]), ignore_index=False)
# note: if df['Absolute Time'] is still of dtypes str, then do this:
# (adapt format as needed; hard to be sure from the image)
df['Absolute Time'] = pd.to_datetime(
df['Absolute Time'],
format='%m.%d.%Y %H:%M:%S.%f',
errors='coerce')
# origin of time; this may have to be taken over multiple sheets
# if all experiments share an absolute origin
origin = df['Absolute Time'].min()
df['Time in hours'] = (df['Absolute Time'] - origin).dt.total_seconds() / 3600
I have a data frame with a lot of columns and rows, the index column contains datetime objects.
date_time column1 column2
10-10-2010 00:00:00 1 10
10-10-2010 00:00:03 1 10
10-10-2010 00:00:06 1 10
Now I want to calculate the difference in time between the first and last datetime object. Therefore:
start = df["date_time"].head(1)
stop = df["date_time"].tail(1)
However I now want to extract this datetime value so that I can use the .total_seconds() seconds to calculate the number of seconds difference between the two datetime objects, something like:
delta_t_seconds = (start - stop).total_seconds()
This however doesn't give the desired result, since start and stop are still series with only one member.
please help
My dataframe has a column which measures time difference in the format HH:MM:SS.000
The pandas is formed from an excel file, the column which stores time difference is an Object. However some entries have negative time difference, the negative sign doesn't matter to me and needs to be removed from the time as it's not filtering a condition I have:
Note: I only have the negative time difference there because of the issue I'm currently having.
I've tried the following functions but I get errors as some of the time difference data is just 00:00:00 and some is 00:00:02.65 and some are 00:00:02.111
firstly how would I ensure that all data in this column is to 00:00:00.000. And then how would I remove the '-' from some the data.
Here's a sample of the time diff column, I cant transform this column into datetime as some of the entries dont have 3 digits after the decimal. Is there a way to iterate through the column and add a 0 if the length of the value isn't equal to 12 digits.
00:00:02.97
00:00:03:145
00:00:00
00:00:12:56
28 days 03:05:23.439
It looks like you need to clean your input before you can parse to timedelta, e.g. with the following function:
import pandas as pd
def clean_td_string(s):
if s.count(':') > 2:
return '.'.join(s.rsplit(':', 1))
return s
Applied to a df's column, this looks like
df = pd.DataFrame({'Time Diff': ['00:00:02.97', '00:00:03:145', '00:00:00', '00:00:12:56', '28 days 03:05:23.439']})
df['Time Diff'] = pd.to_timedelta(df['Time Diff'].apply(clean_td_string))
# df['Time Diff']
# 0 0 days 00:00:02.970000
# 1 0 days 00:00:03.145000
# 2 0 days 00:00:00
# 3 0 days 00:00:12.560000
# 4 28 days 03:05:23.439000
# Name: Time Diff, dtype: timedelta64[ns]