Convert date column in dataframe to unix python - python

I would like to convert a date column in a Dataframe to unixtime in a new column.
Index Date Second Measurement
0 0 2020-02-24 10:52:38 0.000 0.001155460021
1 1 2020-02-24 10:52:39 0.109 0.001124729984
2 2 2020-02-24 10:52:40 0.203 0.001119069988
I tried a lot, but always get an error. This does not work:
laser['unixtime'] = laser['Date'].arrow.timestamp()
laser['unixtime'] = laser['Date'].timestamp()
laser['unixtime'] = time.mktime(laser['Date'].timetuple())
Can anyone help me out?
greets

Solution with two examples (one when the Date column is a string and one when it is not).
import pandas as pd
from datetime import timezone
# Suppress scientific notation in Pandas
pd.set_option('display.float_format', lambda x: f"{x:.0f}")
df = pd.DataFrame()
df["Date"] = pd.date_range(start='2014-08-01 09:00', freq='H', periods=3, tz='Europe/Berlin')
df["DateUnix"] = df.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df
df2 = pd.DataFrame({"Date": ["2020-02-24 10:52:38"]})
# Convert from object to datetime
df2.Date = pd.to_datetime(df2.Date)
df2["DateUnix"] = df2.Date.map(lambda x: x.replace(tzinfo=timezone.utc).timestamp())
df2

Related

Cannot find index of corresponding date in pandas DataFrame

I have the following DataFrame with a Date column,
0 2021-12-13
1 2021-12-10
2 2021-12-09
3 2021-12-08
4 2021-12-07
...
7990 1990-01-08
7991 1990-01-05
7992 1990-01-04
7993 1990-01-03
7994 1990-01-02
I am trying to find the index for a specific date in this DataFrame using the following code,
# import raw data into DataFrame
df = pd.DataFrame.from_records(data['dataset']['data'])
df.columns = data['dataset']['column_names']
df['Date'] = pd.to_datetime(df['Date'])
# sample date to search for
sample_date = dt.date(2021,12,13)
print(sample_date)
# return index of sample date
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
The output of the program is,
2021-12-13
[]
I can't understand why. I have cast the Date column in the DataFrame to a DateTime and I'm doing a like-for-like comparison.
I have reproduced your Dataframe with minimal samples. By changing the way that you can compare the date will work like this below.
import pandas as pd
import datetime as dt
df = pd.DataFrame({'Date':['2021-12-13','2021-12-10','2021-12-09','2021-12-08']})
df['Date'] = pd.to_datetime(df['Date'].astype(str), format='%Y-%m-%d')
sample_date = dt.datetime.strptime('2021-12-13', '%Y-%m-%d')
date_index = df.index[df['Date'] == sample_date].tolist()
print(date_index)
output:
[0]
The search data was in the index number 0 of the DataFrame
Please let me know if this one has any issues

pandas insert new first row and calculate timestamp based on minimum date

Hi i am looking for a more elegant solution than my code. i have a given df which look like this:
import pandas as pd
from pandas.tseries.offsets import DateOffset
sdate = date(2021,1,31)
edate = date(2021,8,30)
date_range = pd.date_range(sdate,edate-timedelta(days=1),freq='m')
df_test = pd.DataFrame({ 'Datum': date_range})
i take this df and have to insert a new first row with the minimum date
data_perf_indexed_vv = df_test.copy()
minimum_date = df_test['Datum'].min()
data_perf_indexed_vv = data_perf_indexed_vv.reset_index()
df1 = pd.DataFrame([[np.nan] * len(data_perf_indexed_vv.columns)],
columns=data_perf_indexed_vv.columns)
data_perf_indexed_vv = df1.append(data_perf_indexed_vv, ignore_index=True)
data_perf_indexed_vv['Datum'].iloc[0] = minimum_date - DateOffset(months=1)
data_perf_indexed_vv.drop(['index'], axis=1)
may somebody have a shorter or more elegant solution. thanks
Instead of writing such big 2nd block of code just make use of:
df_test.loc[len(df_test)+1,'Datum']=(df_test['Datum'].min()-DateOffset(months=1))
Finally make use of sort_values() method:
df_test=df_test.sort_values(by='Datum',ignore_index=True)
Now if you print df_test you will get desired output:
#output
Datum
0 2020-12-31
1 2021-01-31
2 2021-02-28
3 2021-03-31
4 2021-04-30
5 2021-05-31
6 2021-06-30
7 2021-07-31

timestamp is not read by Pandas

I have a timestamp that looks like this: "1994-10-01:00:00:00" and when I've trying with pd.read_csv or pd.read_table to read this dataset, it imports everything including the date column ([0]) but not even as an object. This is part of my code:
namevar = ['timestamp', 'nsub',
'sub_cms', # var 1 [cms]
'sub_gwflow', # var 2 [cfs]
'sub_interflow', # var 3 [cfs]
'sub_sroff', # var 4 [cfs]
....
'subinc_sroff', # var 13
'subinc_tavgc'] # var 14
df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
names=namevar, infer_datetime_format=True,
parse_dates=[0])
print(df)
Results in:
nsub sub_cms ... subinc_sroff subinc_tavgc
timestamp
1994-10-01:00:00:00 1 4.4180 ... 0.0 59.11000
1994-10-01:00:00:00 2 2.6690 ... 0.0 89.29000
1994-10-01:00:00:00 3 4.3170 ... 0.0 77.02000
...
2000-09-30:00:00:00 2 2.3580 ... 0.0 0.19570
2000-09-30:00:00:00 3 2.2250 ... 0.0 0.73340
2000-09-30:00:00:00 4 0.8876 ... 0.0 0.07124
[8768 rows x 15 columns]
print(df.dtypes)
Results in:
nsub int64
sub_cms float64
sub_gwflow float64
sub_interflow float64
sub_sroff float64
subinc_actet float64
...
subinc_sroff float64
subinc_tavgc float64
dtype: object
my ultimate goal is that once the timestamp is in the dataframe I could modify it by getting rid of the time, with:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)
but when I run this now, it is telling me " KeyError: 'timestamp' "
Any help in getting the timestamp in the dataframe is much appreciated.
As highlighted by #s13wr81, the way bring 'timstamp' into the dataframe as a column was by removing index_col='timestamp' from the statement.
In order to edit timestamp properly, I needed to remove the :Hr:Min:Sec portion of it by using:
df['timestamp'] = df.timestamp.str.split(":", expand=True)
and then to convert timestamp as a Panda datetime I used:
df['timestamp'] = pd.to_datetime(df['timestamp'], format='%Y%m%d', infer_datetime_format=True)
I am not sure about the data, but i think timestamp is not a column, but index. this kind of problem, sometimes happen when we doing grouping.
Try :
"timestamp" in df.columns
if the result false, then :
df = df.reset_index()
next to strip time in timestamp try :
df['timestamp'] = pd.to_datetime(df.timestamp,unit='ns')
df = pd.read_csv(root.anima, delimiter='\t', skiprows=1, header=avar+6, index_col=0,
names=namevar, infer_datetime_format=True,
parse_dates=[0])
I think you are explicitly telling pandas to consider column 0 as the index which happens to be your datetime column.
Kindly try removing the index_col=0 from the pd.read_csv() and i think it will work.
I think the issue is that the timestamp is in a non-standard format. There is a colon between the date and time parts. Here is a way to convert the value in the example:
import datetime
# note ':' between date part and time part
raw_timestamp = '1994-10-01:00:00:00'
format_string = '%Y-%m-%d:%H:%M:%S'
result = datetime.datetime.strptime(raw_timestamp, format_string)
print(result)
1994-10-01 00:00:00
You could use pd.to_datetime() with the format_string in this example, to process an entire column of timestamps.
UPDATE
Here is an example that uses a modified version of the original data (timestamp + one column; every entry is unique):
from io import StringIO
import pandas as pd
data = '''timestamp nsub
1994-10-01:00:00:00 1
1994-10-02:00:00:00 2
1994-10-03:00:00:00 3
2000-09-28:00:00:00 4
2000-09-29:00:00:00 5
2000-09-30:00:00:00 6
'''
df = pd.read_csv(StringIO(data), sep='\s+')
df['timestamp'] = pd.to_datetime(df['timestamp'],
format='%Y-%m-%d:%H:%M:%S',
errors='coerce')
print(df, end='\n\n')
print(df.dtypes)
timestamp nsub
0 1994-10-01 1
1 1994-10-02 2
2 1994-10-03 3
3 2000-09-28 4
4 2000-09-29 5
5 2000-09-30 6
timestamp datetime64[ns]
nsub int64
dtype: object

Create new date column in python pandas

I'm trying to create a new date column based on an existing date column in my dataframe. I want to take all the dates in the first column and make them the first of the month in the second column so:
03/15/2019 = 03/01/2019
I know I can do this using:
df['newcolumn'] = pd.to_datetime(df['oldcolumn'], format='%Y-%m-%d').apply(lambda dt: dt.replace(day=1)).dt.date
My issues is some of the data in the old column is not valid dates. There is some text data in some of the rows. So, I'm trying to figure out how to either clean up the data before I do this like:
if oldcolumn isn't a date then make it 01/01/1990 else oldcolumn
Or, is there a way to do this with try/except?
Any assistance would be appreciated.
At first we generate some sample data:
df = pd.DataFrame([['2019-01-03'], ['asdf'], ['2019-11-10']], columns=['Date'])
This can be safely converted to datetime
df['Date'] = pd.to_datetime(df['Date'], errors='coerce')
mask = df['Date'].isnull()
df.loc[mask, 'Date'] = dt.datetime(1990, 1, 1)
Now you don't need the slow apply
df['New'] = df['Date'] + pd.offsets.MonthBegin(-1)
Try with the argument errors=coerce.
This will return NaT for the text values.
df['newcolumn'] = pd.to_datetime(df['oldcolumn'],
format='%Y-%m-%d',
errors='coerce').apply(lambda dt: dt.replace(day=1)).dt.date
For example
# We have this dataframe
ID Date
0 111 03/15/2019
1 133 01/01/2019
2 948 Empty
3 452 02/10/2019
# We convert Date column to datetime
df['Date'] = pd.to_datetime(df.Date, format='%m/%d/%Y', errors='coerce')
Output
ID Date
0 111 2019-03-15
1 133 2019-01-01
2 948 NaT
3 452 2019-02-10

Python data-frame using pandas

I have a dataset which looks like below
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:15 000]
[25/May/2015:23:11:16 000]
[25/May/2015:23:11:16 000]
Now i have made this into a DF and df[0] has [25/May/2015:23:11:15 and df[1] has 000]. I want to send all the data which ends with same seconds to a file. in the above example they end with 15 and 16 as seconds. So all ending with 15 seconds into one and the other into a different one and many more
I have tried the below code
import pandas as pd
data = pd.read_csv('apache-access-log.txt', sep=" ", header=None)
df = pd.DataFrame(data)
print(df[0],df[1].str[-2:])
Converting that column to a datetime would make it easier to work on, e.g.:
df['date'] = pd.to_datetime(df['date'], format='%d/%B/%Y:%H:%m:%S')
The you can simply iterate over a groupby(), e.g.:
In []:
for k, frame in df.groupby(df['date'].dt.second):
#frame.to_csv('file{}.csv'.format(k))
print('{}\n{}\n'.format(k, frame))
Out[]:
15
date value
0 2015-11-25 23:00:15 0
1 2015-11-25 23:00:15 0
16
date value
2 2015-11-25 23:00:16 0
3 2015-11-25 23:00:16 0
You can set your datetime as the index for the dataframe, and then use loc and to_csv Pandas' functions. Obviously, as other answers points out, you should convert your date to datetime while reading your dataframe.
Example:
df = df.set_index(['date'])
df.loc['25/05/2018 23:11:15':'25/05/2018 23:11:15'].to_csv('df_data.csv')
Try out this,
## Convert a new column with seconds value
df['seconds'] = df.apply(lambda row: row[0].split(":")[3].split(" ")[0], axis=1)
for sec in df['seconds'].unique():
## filter by seconds
print("Resutl ",df[df['seconds'] == sec])

Categories