Parse DateTime from Object with Pandas - python

I would like to parse the year column to datetime.
name id nametype recclass mass (g) fall year
0 Aachen 1 Valid L5 21.0 Fell 01/01/1880 12:00:00 AM
... reclat reclong GeoLocation
... 50.77500 6.08333 (50.775000, 6.083330)
df['year'].apply(dateutil.parser.parse) and that parses as 1880-01-01 00:00:00 but i can use that for selecting dates.
Do anyone have a tip for me?

I think need to_datetime:
df = pd.DataFrame({'year':['01/01/1880 12:00:00 AM']})
df['year'] = pd.to_datetime(df['year'])
print (df)
year
0 1880-01-01

Since you have converted the year column to datetime, you can use:
df['year'] = df['year'].dt.date
# year
# 1880-01-01
However, for datetime casting, note that pandas has an inbuilt datetime parser that is easy to use than dateutil IMO.

Related

Pandas DateTime for Month

I have month column with values formatted as: 2019M01
To find the seasonality I need this formatted into Pandas DateTime format.
How to format 2019M01 into datetime so that I can use it for my seasonality plotting?
Thanks.
Use to_datetime with format parameter:
print (df)
date
0 2019M01
1 2019M03
2 2019M04
df['date'] = pd.to_datetime(df['date'], format='%YM%m')
print (df)
date
0 2019-01-01
1 2019-03-01
2 2019-04-01

How do i convert 2014-12-19T05:00:00 to proper datatime, 2014-12 in Python

I get a date in data which looks like this "2014-12-19T05:00:00". I want to convert it in order to obtain a Date or String object and get something like "01-04-2018" that its "dd-MM-YYYY" in dataframe. How can I do it?
The result will be used for time series. So far,my time series result is like this, perhaps because it doesn't detect the date format (x-axis not in datetime).
Date column:
For a pandas dataframe column/series:
Convert a string column (dtype of object) to a datetime column (dtype of datetime64[ns]) using to_datetime. Then if you want another column with your datetimes back in a string format of your choosing, use dt.strftime.
An example:
df = pd.DataFrame({
"Date": ["2014-12-19T05:00:00", "2014-12-20T05:00:00", "2014-12-21T05:00:00"],
"Value": [0, 2, 4]})
df['DateTime'] = pd.to_datetime(df['Date'])
df['MyDateTimeString'] = df['DateTime'].dt.strftime('%Y-%m-%d')
print(df)
# Date Value DateTime MyDateTimeString
# 0 2014-12-19T05:00:00 0 2014-12-19 05:00:00 2014-12-19
# 1 2014-12-20T05:00:00 2 2014-12-20 05:00:00 2014-12-20
# 2 2014-12-21T05:00:00 4 2014-12-21 05:00:00 2014-12-21
In general:
To read your strings into datetime objects, use strptime:
import datetime
d = datetime.datetime.strptime("2014-12-19T05:00:00", "%Y-%m-%dT%H:%M:%S")
Then to get a string representation of those datetime objects, use strftime:
d.strftime("%d-%m-%Y")
For more general string-to-datetime parsing, the dateparser library is handy:
import dateparser
dateparser.parse("2014-12-19T05:00:00").strftime("%d-%m-%Y")
# '19-12-2014'
dateparser.parse("December 19, 2014 at 5am").strftime("%d-%m-%Y")
# '19-12-2014'
I recommend using https://pypi.org/project/python-dateutil/
(Install with pip install python-dateutil.)
>>> import dateutil.parser
>>> d = dateutil.parser.isoparse('2014-12-19T05:00:00')
>>> print(d.strftime('%m-%d-%Y'))
12-19-2014

How to compare pandas date column with hardcoded date

This is my code:
print (df.loc[df.DATE == '2016-02-05'])
I am trying to compare this date with date of pandas. It returns empty dataframe. What should I do ?
Edit: Original dataframe:
Just convert your string to datetime (I suppose that you dataframe also contains datetimes, rather than strings) and do the comparison you wanted to do:
from datetime import datetime
if __name__ == "__main__":
t = datetime.strptime('2016-02-05', '%Y-%m-%d')
print(t)
Hope the answer will help, feel free to ask questions.
If your DATE df column is not datetimes, but just strings, convert them to datetimes the same way.
You need to convert the string to a datetime object as well.
print(df)
datetime_str = '2015/02/04'
print("({}){}".format(type(datetime_str), datetime_str))
datetime_object = datetime.strptime(datetime_str, '%Y/%m/%d')
print("({}){}".format(type(datetime_object), datetime_object))
value = df.loc[df.DATE == datetime_object]
print("value =", value)
OUTPUT:
year month day DATE
0 2015 2 4 2015-02-04
1 2016 3 5 2016-03-05
(<class 'str'>)2015/02/04
(<class 'datetime.datetime'>)2015-02-04 00:00:00
value = year month day DATE
0 2015 2 4 2015-02-04

Split Date Time string (not in usual format) and pull out month

I have a dataframe that has a date time string but is not in traditional date time format. I would like to separate out the date from the time into two separate columns. And then eventually also separate out the month.
This is what the date/time string looks like: 2019-03-20T16:55:52.981-06:00
>>> df.head()
Date Score
2019-03-20T16:55:52.981-06:00 10
2019-03-07T06:16:52.174-07:00 9
2019-06-17T04:32:09.749-06:003 1
I tried this but got a type error:
df['Month'] = pd.DatetimeIndex(df['Date']).month
This can be done just using pandas itself. You can first convert the Date column to datetime by passing utc = True:
df['Date'] = pd.to_datetime(df['Date'], utc = True)
And then just extract the month using dt.month:
df['Month'] = df['Date'].dt.month
Output:
Date Score Month
0 2019-03-20 22:55:52.981000+00:00 10 3
1 2019-03-07 13:16:52.174000+00:00 9 3
2 2019-06-17 10:32:09.749000+00:00 1 6
From the documentation of pd.to_datetime you can see a parameter:
utc : boolean, default None
Return UTC DatetimeIndex if True (converting any tz-aware datetime.datetime objects as well).

Extracting just Month and Year separately from Pandas Datetime column

I have a Dataframe, df, with the following column:
df['ArrivalDate'] =
...
936 2012-12-31
938 2012-12-29
965 2012-12-31
966 2012-12-31
967 2012-12-31
968 2012-12-31
969 2012-12-31
970 2012-12-29
971 2012-12-31
972 2012-12-29
973 2012-12-29
...
The elements of the column are pandas.tslib.Timestamp.
I want to just include the year and month. I thought there would be simple way to do it, but I can't figure it out.
Here's what I've tried:
df['ArrivalDate'].resample('M', how = 'mean')
I got the following error:
Only valid with DatetimeIndex or PeriodIndex
Then I tried:
df['ArrivalDate'].apply(lambda(x):x[:-2])
I got the following error:
'Timestamp' object has no attribute '__getitem__'
Any suggestions?
Edit: I sort of figured it out.
df.index = df['ArrivalDate']
Then, I can resample another column using the index.
But I'd still like a method for reconfiguring the entire column. Any ideas?
If you want new columns showing year and month separately you can do this:
df['year'] = pd.DatetimeIndex(df['ArrivalDate']).year
df['month'] = pd.DatetimeIndex(df['ArrivalDate']).month
or...
df['year'] = df['ArrivalDate'].dt.year
df['month'] = df['ArrivalDate'].dt.month
Then you can combine them or work with them just as they are.
The df['date_column'] has to be in date time format.
df['month_year'] = df['date_column'].dt.to_period('M')
You could also use D for Day, 2M for 2 Months etc. for different sampling intervals, and in case one has time series data with time stamp, we can go for granular sampling intervals such as 45Min for 45 min, 15Min for 15 min sampling etc.
You can directly access the year and month attributes, or request a datetime.datetime:
In [15]: t = pandas.tslib.Timestamp.now()
In [16]: t
Out[16]: Timestamp('2014-08-05 14:49:39.643701', tz=None)
In [17]: t.to_pydatetime() #datetime method is deprecated
Out[17]: datetime.datetime(2014, 8, 5, 14, 49, 39, 643701)
In [18]: t.day
Out[18]: 5
In [19]: t.month
Out[19]: 8
In [20]: t.year
Out[20]: 2014
One way to combine year and month is to make an integer encoding them, such as: 201408 for August, 2014. Along a whole column, you could do this as:
df['YearMonth'] = df['ArrivalDate'].map(lambda x: 100*x.year + x.month)
or many variants thereof.
I'm not a big fan of doing this, though, since it makes date alignment and arithmetic painful later and especially painful for others who come upon your code or data without this same convention. A better way is to choose a day-of-month convention, such as final non-US-holiday weekday, or first day, etc., and leave the data in a date/time format with the chosen date convention.
The calendar module is useful for obtaining the number value of certain days such as the final weekday. Then you could do something like:
import calendar
import datetime
df['AdjustedDateToEndOfMonth'] = df['ArrivalDate'].map(
lambda x: datetime.datetime(
x.year,
x.month,
max(calendar.monthcalendar(x.year, x.month)[-1][:5])
)
)
If you happen to be looking for a way to solve the simpler problem of just formatting the datetime column into some stringified representation, for that you can just make use of the strftime function from the datetime.datetime class, like this:
In [5]: df
Out[5]:
date_time
0 2014-10-17 22:00:03
In [6]: df.date_time
Out[6]:
0 2014-10-17 22:00:03
Name: date_time, dtype: datetime64[ns]
In [7]: df.date_time.map(lambda x: x.strftime('%Y-%m-%d'))
Out[7]:
0 2014-10-17
Name: date_time, dtype: object
If you want the month year unique pair, using apply is pretty sleek.
df['mnth_yr'] = df['date_column'].apply(lambda x: x.strftime('%B-%Y'))
Outputs month-year in one column.
Don't forget to first change the format to date-time before, I generally forget.
df['date_column'] = pd.to_datetime(df['date_column'])
SINGLE LINE: Adding a column with 'year-month'-paires:
('pd.to_datetime' first changes the column dtype to date-time before the operation)
df['yyyy-mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y-%m')
Accordingly for an extra 'year' or 'month' column:
df['yyyy'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%Y')
df['mm'] = pd.to_datetime(df['ArrivalDate']).dt.strftime('%m')
Extracting the Year say from ['2018-03-04']
df['Year'] = pd.DatetimeIndex(df['date']).year
The df['Year'] creates a new column. While if you want to extract the month just use .month
You can first convert your date strings with pandas.to_datetime, which gives you access to all of the numpy datetime and timedelta facilities. For example:
df['ArrivalDate'] = pandas.to_datetime(df['ArrivalDate'])
df['Month'] = df['ArrivalDate'].values.astype('datetime64[M]')
#KieranPC's solution is the correct approach for Pandas, but is not easily extendible for arbitrary attributes. For this, you can use getattr within a generator comprehension and combine using pd.concat:
# input data
list_of_dates = ['2012-12-31', '2012-12-29', '2012-12-30']
df = pd.DataFrame({'ArrivalDate': pd.to_datetime(list_of_dates)})
# define list of attributes required
L = ['year', 'month', 'day', 'dayofweek', 'dayofyear', 'weekofyear', 'quarter']
# define generator expression of series, one for each attribute
date_gen = (getattr(df['ArrivalDate'].dt, i).rename(i) for i in L)
# concatenate results and join to original dataframe
df = df.join(pd.concat(date_gen, axis=1))
print(df)
ArrivalDate year month day dayofweek dayofyear weekofyear quarter
0 2012-12-31 2012 12 31 0 366 1 4
1 2012-12-29 2012 12 29 5 364 52 4
2 2012-12-30 2012 12 30 6 365 52 4
Thanks to jaknap32, I wanted to aggregate the results according to Year and Month, so this worked:
df_join['YearMonth'] = df_join['timestamp'].apply(lambda x:x.strftime('%Y%m'))
Output was neat:
0 201108
1 201108
2 201108
There is two steps to extract year for all the dataframe without using method apply.
Step1
convert the column to datetime :
df['ArrivalDate']=pd.to_datetime(df['ArrivalDate'], format='%Y-%m-%d')
Step2
extract the year or the month using DatetimeIndex() method
pd.DatetimeIndex(df['ArrivalDate']).year
df['Month_Year'] = df['Date'].dt.to_period('M')
Result :
Date Month_Year
0 2020-01-01 2020-01
1 2020-01-02 2020-01
2 2020-01-03 2020-01
3 2020-01-04 2020-01
4 2020-01-05 2020-01
df['year_month']=df.datetime_column.apply(lambda x: str(x)[:7])
This worked fine for me, didn't think pandas would interpret the resultant string date as date, but when i did the plot, it knew very well my agenda and the string year_month where ordered properly... gotta love pandas!
Then I tried:
df['ArrivalDate'].apply(lambda(x):x[:-2])
I think here the proper input should be string.
df['ArrivalDate'].astype(str).apply(lambda(x):x[:-2])

Categories