I've been playing around with datetimes and timestamps, and I've come across something that I can't understand.
import pandas as pd
import datetime
year_month = pd.DataFrame({'year':[2001,2002,2003], 'month':[1,2,3]})
year_month['date'] = [datetime.datetime.strptime(str(y) + str(m) + '1', '%Y%m%d') for y,m in zip(year_month['year'], year_month['month'])]
>>> year_month
month year date
0 1 2001 2001-01-01
1 2 2002 2002-02-01
2 3 2003 2003-03-01
I think the unique function is doing something to the timestamps that is changing them somehow:
first_date = year_month['date'].unique()[0]
>>> first_date == year_month['date'][0]
False
In fact:
>>> year_month['date'].unique()
array(['2000-12-31T16:00:00.000000000-0800',
'2002-01-31T16:00:00.000000000-0800',
'2003-02-28T16:00:00.000000000-0800'], dtype='datetime64[ns]')
My suspicions are that there is some sort of timezone difference underneath the functions, but I can't figure it out.
EDIT
I just checked the python commands list(set()) as an alternative to the unique function, and that works. This must be a quirk of the unique() function.
You have to convert to datetime64 to compare:
In [12]:
first_date == year_month['date'][0].to_datetime64()
Out[12]:
True
This is because unique has converted the dtype to datetime64:
In [6]:
first_date = year_month['date'].unique()[0]
first_date
Out[6]:
numpy.datetime64('2001-01-01T00:00:00.000000000+0000')
I think is because unique returns a np array and there is no dtype that numpy understands TimeStamp currently: Converting between datetime, Timestamp and datetime64
Related
I have data in the form of yyyymm in a CSV file, I want to import it into pandas and find the range of timeperiod. eg: 202201. I want to apply datetime functions to this but am unable to convert it into appropriate format.
test['YEAR_MONTH'] = pd.to_datetime(
test['YEARMONTH'], format='%Y%m', errors='coerce').dropna()
I tried using this, but to no avail.
>>> import pandas as pd
>>> import datetime
# Sample data as per OP's format
>>> test = pd.DataFrame({'YEARMONTH':['202201','202202','202203']})
>>> test
YEARMONTH
0 202201
1 202202
2 202203
# Using strptime to convert to datetime object
>>> test_mod = test['YEARMONTH'].apply(lambda x: datetime.datetime.strptime(x,'%Y%m'))
>>> test_mod
0 2022-01-01
1 2022-02-01
2 2022-03-01
Name: YEARMONTH, dtype: datetime64[ns]
# Note - By default it assigns date as the first date of every month
I get a date in data which looks like this "2014-12-19T05:00:00". I want to convert it in order to obtain a Date or String object and get something like "01-04-2018" that its "dd-MM-YYYY" in dataframe. How can I do it?
The result will be used for time series. So far,my time series result is like this, perhaps because it doesn't detect the date format (x-axis not in datetime).
Date column:
For a pandas dataframe column/series:
Convert a string column (dtype of object) to a datetime column (dtype of datetime64[ns]) using to_datetime. Then if you want another column with your datetimes back in a string format of your choosing, use dt.strftime.
An example:
df = pd.DataFrame({
"Date": ["2014-12-19T05:00:00", "2014-12-20T05:00:00", "2014-12-21T05:00:00"],
"Value": [0, 2, 4]})
df['DateTime'] = pd.to_datetime(df['Date'])
df['MyDateTimeString'] = df['DateTime'].dt.strftime('%Y-%m-%d')
print(df)
# Date Value DateTime MyDateTimeString
# 0 2014-12-19T05:00:00 0 2014-12-19 05:00:00 2014-12-19
# 1 2014-12-20T05:00:00 2 2014-12-20 05:00:00 2014-12-20
# 2 2014-12-21T05:00:00 4 2014-12-21 05:00:00 2014-12-21
In general:
To read your strings into datetime objects, use strptime:
import datetime
d = datetime.datetime.strptime("2014-12-19T05:00:00", "%Y-%m-%dT%H:%M:%S")
Then to get a string representation of those datetime objects, use strftime:
d.strftime("%d-%m-%Y")
For more general string-to-datetime parsing, the dateparser library is handy:
import dateparser
dateparser.parse("2014-12-19T05:00:00").strftime("%d-%m-%Y")
# '19-12-2014'
dateparser.parse("December 19, 2014 at 5am").strftime("%d-%m-%Y")
# '19-12-2014'
I recommend using https://pypi.org/project/python-dateutil/
(Install with pip install python-dateutil.)
>>> import dateutil.parser
>>> d = dateutil.parser.isoparse('2014-12-19T05:00:00')
>>> print(d.strftime('%m-%d-%Y'))
12-19-2014
This is my code:
print (df.loc[df.DATE == '2016-02-05'])
I am trying to compare this date with date of pandas. It returns empty dataframe. What should I do ?
Edit: Original dataframe:
Just convert your string to datetime (I suppose that you dataframe also contains datetimes, rather than strings) and do the comparison you wanted to do:
from datetime import datetime
if __name__ == "__main__":
t = datetime.strptime('2016-02-05', '%Y-%m-%d')
print(t)
Hope the answer will help, feel free to ask questions.
If your DATE df column is not datetimes, but just strings, convert them to datetimes the same way.
You need to convert the string to a datetime object as well.
print(df)
datetime_str = '2015/02/04'
print("({}){}".format(type(datetime_str), datetime_str))
datetime_object = datetime.strptime(datetime_str, '%Y/%m/%d')
print("({}){}".format(type(datetime_object), datetime_object))
value = df.loc[df.DATE == datetime_object]
print("value =", value)
OUTPUT:
year month day DATE
0 2015 2 4 2015-02-04
1 2016 3 5 2016-03-05
(<class 'str'>)2015/02/04
(<class 'datetime.datetime'>)2015-02-04 00:00:00
value = year month day DATE
0 2015 2 4 2015-02-04
Want to calculate the difference of days between pandas date series -
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
and today's date.
I tried but could not come up with logical solution.
Please help me with the code. Actually I am new to python and there are lot of syntactical errors happening while applying any function.
You could do something like
# generate time data
data = pd.to_datetime(pd.Series(["2018-09-1", "2019-01-25", "2018-10-10"]))
pd.to_datetime("now") > data
returns:
0 False
1 True
2 False
you could then use that to select the data
data[pd.to_datetime("now") > data]
Hope it helps.
Edit: I misread it but you can easily alter this example to calculate the difference:
data - pd.to_datetime("now")
returns:
0 -122 days +13:10:37.489823
1 24 days 13:10:37.489823
2 -83 days +13:10:37.489823
dtype: timedelta64[ns]
You can try as Follows:
>>> from datetime import datetime
>>> df
col1
0 2013-02-16
1 2013-01-29
2 2013-02-21
3 2013-02-22
4 2013-03-01
5 2013-03-14
6 2013-03-18
7 2013-03-21
Make Sure to convert the column names to_datetime:
>>> df['col1'] = pd.to_datetime(df['col1'], infer_datetime_format=True)
set the current datetime in order to Further get the diffrence:
>>> curr_time = pd.to_datetime("now")
Now get the Difference as follows:
>>> df['col1'] - curr_time
0 -2145 days +07:48:48.736939
1 -2163 days +07:48:48.736939
2 -2140 days +07:48:48.736939
3 -2139 days +07:48:48.736939
4 -2132 days +07:48:48.736939
5 -2119 days +07:48:48.736939
6 -2115 days +07:48:48.736939
7 -2112 days +07:48:48.736939
Name: col1, dtype: timedelta64[ns]
With numpy you can solve it like difference-two-dates-days-weeks-months-years-pandas-python-2
. bottom line
df['diff_days'] = df['First dates column'] - df['Second Date column']
# for days use 'D' for weeks use 'W', for month use 'M' and for years use 'Y'
df['diff_days']=df['diff_days']/np.timedelta64(1,'D')
print(df)
if you want days as int and not as float use
df['diff_days']=df['diff_days']//np.timedelta64(1,'D')
From the pandas docs under Converting To Timestamps you will find:
"Converting to Timestamps To convert a Series or list-like object of date-like objects e.g. strings, epochs, or a mixture, you can use the to_datetime function"
I haven't used pandas before but this suggests your pandas date series (a list-like object) is iterable and each element of this series is an instance of a class which has a to_datetime function.
Assuming my assumptions are correct, the following function would take such a list and return a list of timedeltas' (a datetime object representing the difference between two date time objects).
from datetime import datetime
def convert(pandas_series):
# get the current date
now = datetime.now()
# Use a list comprehension and the pandas to_datetime method to calculate timedeltas.
return [now - pandas_element.to_datetime() for pandas_series]
# assuming 'some_pandas_series' is a list-like pandas series object
list_of_timedeltas = convert(some_pandas_series)
I am working with python 3.5.2, pandas 0.18.1 and sqlite3.
In my data base, I have a column unix_time with INT for seconds since 1970. Ideally I want to read my dataframe from sqlite, and then create a time column which would correspond to the datetime or pandas.tslib.Timestamp conversion of the unix_time column that I woul only use for some processing and then drop before saving the dataframe back.
The issue is that when parsing the unix_time column using :
df = pd.read_from_sql_query("SELECT * FROM test", con, parse_dates=['unix_time'])
I obtain pandas.tslib.Timestamp types which is fine for my processing, but then I have to recreate my original unix_time column using :
df['unix_time'][i] = (df['unix_time'][i] - datetime(1970,1,1)).total_seconds()
which is really 'dirty'
First question : Do you have a better way?
I thought about giving up the unix time format and only use datetime format but the to_datetime method from pandas returns in fact pandas.tslib.Timestamp ... And anyway, doing so would force me to iterate over all rows which is a bad solution. (It is impossible to apply to_datetime on something else than a view over a single cell of the dataframe
Second question : Is it possible to apply it on a series?
My last try was with directly using df['time'] = datetime.datetime.fromtimestamp(df['unix_time']) but surprisingly, it also returns pandas.tslib.Timestamp.
In the end, knowing that I can only save unix timestamps or datetimes, my only choices for the moment are :
parsing but then having to convert them back to unix timestamp one by
one.
Or not parse it but have to convert them to pandas.tslib.Timestamp
one by one.
It would be great if I could convert a whole series.
Last question : Is there a way to convert a unix timestamps series to datetime (or at least pandas.tslib.Timestamp), or a pandas.tslib.Timestamp (or datetime) series to unix timestamps?
Thanks
EDIT:
During my processing, I extract a row that I want to append to my dataset. Apparently, the coversion to pandas.tslib.Timestamp appends implicitly when passing from dataframe to serie :
df = pd.DataFrame({'UNX':pd.date_range('2016-01-01', freq='9999S', periods=10).astype(np.int64)//10**9})
df['Date'] = pd.to_datetime(df.UNX, unit='s')
print(df.Date.dtypes)
print(type(df['Date'][0]))
test = df.iloc[0]
print(type(test.Date))
new_df = test.to_frame().transpose() #from here, impossible to do : new_df.to_sql("test", con) because the type for 'Date' is not supported
print(new_df.Date.dtypes)
returns
datetime64[ns]
<class 'pandas.tslib.Timestamp'>
<class 'pandas.tslib.Timestamp'>
object
Is there a way to convert the 'Date' in new_df from pandas.tslib.Timestamp to datetime64[ns] or datetime.datetime (or simply str) ?
IIUC you can do it this way:
In [96]: df = pd.DataFrame({'UNX':pd.date_range('2016-01-01', freq='9999S', periods=10).astype(np.int64)//10**9})
In [97]: df
Out[97]:
UNX
0 1451606400
1 1451616399
2 1451626398
3 1451636397
4 1451646396
5 1451656395
6 1451666394
7 1451676393
8 1451686392
9 1451696391
Convert UNIX epoch to Python datetime:
In [98]: df['Date'] = pd.to_datetime(df.UNX, unit='s')
In [99]: df
Out[99]:
UNX Date
0 1451606400 2016-01-01 00:00:00
1 1451616399 2016-01-01 02:46:39
2 1451626398 2016-01-01 05:33:18
3 1451636397 2016-01-01 08:19:57
4 1451646396 2016-01-01 11:06:36
5 1451656395 2016-01-01 13:53:15
6 1451666394 2016-01-01 16:39:54
7 1451676393 2016-01-01 19:26:33
8 1451686392 2016-01-01 22:13:12
9 1451696391 2016-01-02 00:59:51
Convert datetime to UNIX epoch:
In [100]: df['UNX2'] = df.Date.astype('int64')//10**9
In [101]: df
Out[101]:
UNX Date UNX2
0 1451606400 2016-01-01 00:00:00 1451606400
1 1451616399 2016-01-01 02:46:39 1451616399
2 1451626398 2016-01-01 05:33:18 1451626398
3 1451636397 2016-01-01 08:19:57 1451636397
4 1451646396 2016-01-01 11:06:36 1451646396
5 1451656395 2016-01-01 13:53:15 1451656395
6 1451666394 2016-01-01 16:39:54 1451666394
7 1451676393 2016-01-01 19:26:33 1451676393
8 1451686392 2016-01-01 22:13:12 1451686392
9 1451696391 2016-01-02 00:59:51 1451696391
Check:
In [102]: df.UNX.eq(df.UNX2).all()
Out[102]: True
Round trip between Pandas Timestamp and Unix Seconds (since 1970-01-01):
date_in = pd.to_datetime("2022-04-07")
# type(date_in) is: pandas._libs.tslibs.timestamps.Timestamp
unix_seconds = date_in.value//10**9
date_out = pd.to_datetime(unix_seconds, unit="s")
Output:
date_in
Out[1]: Timestamp('2021-04-07 00:00:00')
unix_seconds
Out[2]: 1617753600
date_out
Out[3]: Timestamp('2021-04-07 00:00:00')