My instructions are as follows:
Read the date columns in as timestamps, convert them to YYYY/MM/DD
hours:minutes:seconds format, where you set hours minutes and seconds to random
values appropriate to their range
Here is column of the data frame we are suppose to alter to datetime:
Order date
11/12/2016
11/24/2016
6/12/2016
10/12/2016
...
And here is the date time I need
2016/11/12 (random) hours:minutes:seconds
2016/11/24 (random) hours:minutes:seconds
...
My main question is how do I get random hours minutes and seconds. The rest I can figure out with the documentation
You can generate random numbers between 0 and 86399 (number of seconds in a day - 1) and convert to a TimeDelta with pandas.to_timedelta:
import numpy as np
time = pd.to_timedelta(np.random.randint(0, 60*60*24-1, size=len(df)), unit='s')
df['Order date'] = pd.to_datetime(df['Order date']).add(time)
Output:
Order date
0 2016-11-12 02:21:53
1 2016-11-24 13:26:00
2 2016-06-12 15:13:03
3 2016-10-12 14:45:12
You're trying to read the data in '%Y-%m-%d' format but the data is in "%d/%m/%Y" format. See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior to find out how to convert the date to your desired format.
Related
Hello,
I am trying to extract date and time column from my excel data. I am getting column as DataFrame with float values, after using pandas.to_datetime I am getting date with different date than actual date from excel. for example, in excel starting date is 01.01.1901 00:00:00 but in python I am getting 1971-01-03 00:00:00.000000 like this.
How can I solve this problem?
I need a final output in total seconds with DataFrame. First cell starting as a 00 sec and very next cell with timestep of seconds (time difference in ever cell is 15min.)
Thank you.
Your input is fractional days, so there's actually no need to convert to datetime if you want the duration in seconds relative to the first entry. Subtract that from the rest of the column and multiply by the number of seconds in a day:
import pandas as pd
df = pd.DataFrame({"Datum/Zeit": [367.0, 367.010417, 367.020833]})
df["totalseconds"] = (df["Datum/Zeit"] - df["Datum/Zeit"].iloc[0]) * 86400
df["totalseconds"]
0 0.0000
1 900.0288
2 1799.9712
Name: totalseconds, dtype: float64
If you have to use datetime, you'll need to convert to timedelta (duration) to do the same, e.g. like
df["datetime"] = pd.to_datetime(df["Datum/Zeit"], unit="d")
# df["datetime"]
# 0 1971-01-03 00:00:00.000000
# 1 1971-01-03 00:15:00.028800
# 2 1971-01-03 00:29:59.971200
# Name: datetime, dtype: datetime64[ns]
# subtraction of datetime from datetime gives timedelta, which has total_seconds:
df["totalseconds"] = (df["datetime"] - df["datetime"].iloc[0]).dt.total_seconds()
# df["totalseconds"]
# 0 0.0000
# 1 900.0288
# 2 1799.9712
# Name: totalseconds, dtype: float64
I have column containing dates in format as seen here....
2021-09-02 06:00:10.474000+00:00
However, I need to convert this column into a 13 numbered timestamp.
I have tried...
df['date_timestamp'] = df[['date']].apply(lambda x: x[0].timestamp(), axis=1).astype(int)
...but this is not producing a 13 numbered timestamp, just 10 numbers instead.
How can get it to spit a 13 numbered timestamp?
you parse to datetime, take the int64 representation and divide that by 1e6 to get Unix time in milliseconds since the epoch (1970-01-01 UTC). Ex:
import numpy as np
import pandas as pd
# string to datetime
s = pd.to_datetime(["2021-09-02 06:00:10.474000+00:00"])
# datetime to Unix time in milliseconds
unix = s.view(np.int64)/1e6
print(unix[0])
# 1630562410473.9998
The standard int64 representation is nanoseconds; so divide by 1e3 if you need microseconds.
I have an Object Type column with time in format of HH:MM:SS AM/PM. output I need is a column with this time object column converted to Seconds.
For example:
import pandas as pd
df={'time_col':['10:10:10 PM','02:00:05 AM'],'time_seconds':[72610,7205]}
df2=pd.DataFrame(df)
I tried different ways. However, it is adding 1900-01-01 to some rows and not to some rows.
Convert time string to datetime (to account for AM/PM), take the string of the time component (ignore date), and convert that to timedelta. Now you can extract the seconds.
df = pd.DataFrame({'time_col':['10:10:10 PM','02:00:05 AM']})
# make sure we have time objects
df['time_col'] = pd.to_datetime(df['time_col']).dt.time
# time column to string, then to timedelta and extract seconds from that
df['time_seconds'] = pd.to_timedelta(df['time_col'].astype(str)).dt.total_seconds()
df['time_seconds']
0 79810.0
1 7205.0
Name: time_seconds, dtype: float64
If you can fire a pyspark session. This could also work and supplement #MrFuppes answer:
df1=spark.createDataFrame(df2)
timeFmt = "yyyy-MM-dd'T'HH:mm:ss.SSS"
df1.select("time_col", F.unix_timestamp(to_timestamp('time_col', 'hh:mm:ss a'),timeFmt).cast("long").alias("time")).show()
+-----------+-----+
| time_col| time|
+-----------+-----+
|10:10:10 PM|79810|
|02:00:05 AM| 7205|
+-----------+-----+
I am having an issue with converting the Epoch time format 1585542406929 into the 2020-09-14 Hours Minutes Seconds format.
I tried running this, but it gives me an error
from datetime import datetime
DATETIME_FORMAT = '%Y-%m-%d %H:%M:%S'
datetime.utcfromtimestamp(df2.timestamp_ms).strftime('%Y-%m-%d %H:%M:%S')
error : cannot convert the series to <class 'int'>
What am I not understanding about this datetime function? Is there a better function that I should be using?
edit: should mention that timestamp_ms is my column from my dataframe called df.
Thanks to #chepner for helping me understand the format that this is in.
A quick solution is the following:
# make a new column with Unix time as #ForceBru mentioned
start_date = '1970-01-01'
df3['helper'] = pd.to_datetime(start_date)
# convert your column of JSON dates / numbers to days
df3['timestamp_ms'] = df3['timestamp_ms'].apply(lambda x: (((x/1000)/60)/60/24))
# add a day adder column
df3['time_added'] = pd.to_timedelta(df3['timestamp_ms'],'d')
# add the two columns together
df3['actual_time'] = df3['helper'] + df3['time_added']
Note that you might have to subtract some time off from the actual time stamp. For instance, I had sent my message at 10: 40 am today when it is central time (mid west USA), but the timestamp was putting it at 3:40 pm today.
I have a dataframe named 'train' with column ID which represents 'date' in a very unusual manner. For e.g. certain entry in ID:
For example, the value of ID 2013043002 represents the date 30/04/2013
02:00:00
First 4 digits represents year, subsequent 2 digits represent month and day respectively. And last two digits represent time.
So I want to convert this into proper date time format to perform time series analysis.
Use to_datetime with parameter format - check http://strftime.org/:
df = pd.DataFrame({'ID':[2013043002,2013043002]})
df['ID'] = pd.to_datetime(df['ID'], format='%Y%m%d%H')
print(df)
ID
0 2013-04-30 02:00:00
1 2013-04-30 02:00:00
print(df['ID'].dtype)
datetime64[ns]
Use datetime for date time manipulations.
datetime.strptime(d,"%Y%m%d%H").strftime("%d/%m/%Y %H:%M:%S")
First, if you are gonna have ALWAYS the same input style in the Id you could play with string or digit formating ...
Id = 2013043002
Year = Id[0:3]
Month = Id[4:5]
Day = Id[6:7]
Time= Id[-2:-1]
DateFormat = "{}-{}-{}".format(Day,Month,Year)
TimeFormar = "%d:00:00"%Time
Print (DateFormat)
Output:
04:30:2013
Then with this you could wrap it into a function and pass every Ids by loops and manage your data.
Of course, if you dont know your previous ID incomming format you should used the other time module options, and manage the string formating to show it in the order you want.
By using the module datetime you can do that easily with the function strptime :
my_date = datetime.datetime.strptime(ID, "%Y%m%d%H")
"%Y%m%d%H"
is the format of your date : %Y is the year, %m is the month(0 padded), %d is the day(0 padded) and %H is the hour(24H, 0 padded). See http://strftime.org/ for more.