I have a dataset which includes a column for dates. The format for this column is dd.mm.yyyy.
I tried using the recommended methods for sorting the dates to restrict the range to 'December' and '2014.' However, none of the methods seem to be functioning properly. I am considering trying to rearrange it so that it is in the format of yyyy.mm.dd. I'm not sure how to go about doing this. Can someone help?
Code such as
(df['date']>'1-12-2014')&(df['date']<='31-12-2014') don't seem to work.
The problem is that your dates are strings that pandas isn't recognizing as dates. You want to convert them to datetime objects first. There are a couple ways to do this:
df['date'] = df['date'].apply(lambda d: pd.strptime(d, '%d.%m.%Y'))
or
df['date'] = pd.to_datetime(df['date'], format = '%d.%m.%Y')
In both cases, the key is using a format string that matches your data. Then, you can filter how you want:
from datetime import date
df[(df['date'] >= date(2014, 12, 1))&(df['date'] <= date(2014, 12, 31))]
Related
I have 2 columns as month and day in my dataframe which are of the datatypes objects. I want to sort those in ascending order (Jan, Feb, Mar) but in order to do that, I need to convert them to date format. I tried using the following code, and some more but nothing seems to work.
ff['month'] = dt.datetime.strptime(ff['month'],format='%b')
and
ff['month'] = pd.to_datetime(ff['month'], format="%b")
Data Frame
Any help would be appreciated. Thank you
This works to convert Month Names to Integers:
import datetime as dt
ff['month'] = [dt.datetime.strptime(m, "%b").month for m in ff['month']]
(Basically, you're just passing strings one by one to the first function you mentioned, to make it work.)
You can then manipulate (e.g. sort) them.
Working with dataframe:
ff['month'] = ff['month'].apply(lambda x: dt.datetime.strptime(x, "%b"))
ff = ff.sort_values(by=['month'])
ff['month'] = ff['month'].apply(lambda x: x.strftime("%b"))
I am working with currently with a csv file that contains datetimes and timestamps. The dataframe look like this:
print(df[:10])
[0 '2019-10-10 21:59:17.074007' '2015-10-13 00:55:55.544607'
'2017-05-24 06:00:15.959202' '2016-12-07 09:01:04.729686'
'2019-05-29 11:16:44.130063' '2017-01-19 16:06:37.625964'
'2018-04-07 19:42:43.708620' '2016-06-28 03:13:58.266977'
'2015-03-21 00:03:07.704446']
and now I want to convert those strings into datetime and find the earliest date out of it. I don't have much experience in datetime dataframes so I am not sure how to do it. Any suggestions?
You can convert strings to_datetime, then take min:
dates = ['2019-10-10 21:59:17.074007', '2015-10-13 00:55:55.544607',
'2017-05-24 06:00:15.959202', '2016-12-07 09:01:04.729686',
'2019-05-29 11:16:44.130063', '2017-01-19 16:06:37.625964',
'2018-04-07 19:42:43.708620', '2016-06-28 03:13:58.266977',
'2015-03-21 00:03:07.704446']
pd.to_datetime(dates).min()
Output:
Timestamp('2015-03-21 00:03:07.704446')
Update
If you want to do it across all columns of the dataframe:
df.apply(pd.to_datetime).min().min()
Lets call the list you mentioned l, you can iterate on it and parse dates using datetime.strptime, aggregate them in a new list and return the earliest:
from datetime import datetime
parsed_dates = []
for d in l:
parsed_dates.append(datetime.strptime(d, "%Y-%m-%d %H:%M:%S.%f"))
print(min(parsed_dates))
Convert these value to datetime by using to_datetime() method:
df=pd.to_datetime(df,errors='coerce')
Now find earliest date by using min() method:
earliest_date=df.min()
OR you can also find earliest date by using nsmallest() method(This works on Series):
earliest_date=df.nsmallest(1)
I have the following datatable, which I would like to filter by dates greater than "2019-01-01". The problem is that the dates are strings.
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
This is my best attempt.
dt_dates[f.days_date > datetime.strptime(f.days_date, "2019-01-01")]
this returns the error
TypeError: strptime() argument 1 must be str, not Expr
what is the best way to filter dates in python's datatable?
Reference
python datatable
f-expressions
Your datetime syntax is incorrect, for converting a string to a datetime.
What you're looking for is:
dt_dates[f.days_date > datetime.strptime(f.days_date, "%Y-%m-%d")]
Where the 2nd arguement for strptime is the date format.
However, lets take a step back, because this isn't the right way to do it.
First, we should convert all your dates in your Frame to a datetime. I'll be honest, I've never used a datatable, but the syntax looks extremely similar to panda's Dataframe.
In a dataframe, we can do the following:
df_date = df_date['days_date'].apply(lambda x: datetime.strptime(x, '%Y-%m'%d))
This goes through each row where the column is 'dates_date" and converts each string into a datetime.
From there, we can use a filter to get the relevant rows:
df_date = df_date[df_date['days_date'] > datetime.strptime("2019-01-01", "%Y-%m-%d")]
datatable version 1.0.0 introduced native support for date an time data types. Note the difference between these two ways to initialize data:
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']})
dt_dates.stypes
> (stype.str32,)
and
dt_dates = dt.Frame({"days_date": ['2019-01-01','2019-01-02','2019-01-03']}, stype="date32")
dt_dates.stypes
> (stype.date32,)
The latter frame contains days_date column of type datatable.Type.date32 that represents a calendar date. Then one can filter by date as follows:
split_date = datetime.datetime.strptime("2019-01-01", "%Y-%m-%d")
dt_split_date = dt.time.ymd(split_date.year, split_date.month, split_date.day)
dt_dates[dt.f.days_date > dt_split_date, :]
I'm trying to write a program that can print a list of sorted dates but it keeps sorting by the 'day' instead of the full date, day,month,year
Im very new to python so theres probably a lot i'm doing wrong but any help would be greatly appreciated.
So I have it so that you can view the list over two pages.
the dates will sort
12/03/2004
13/08/2001
15/10/2014
but I need the full date sorted
df = pd.read_csv('Employee.csv')
df = df.sort_values('Date of Employment.')
List1 = df.iloc[:50, 1:]
List2 = df.iloc[50:99, 1:]
The datetime data type has to be used for the dates to be sorted correctly
You need to use either one of these approaches to convert the dates to datetime objects:
Approach 1
pd.to_datetime + DataFrame.sort_values:
df['Date of Employment.'] = pd.to_datetime(df['Date of Employment.']')
Approach 2
You can parse the dates at the same time that the Pandas DataFrame is being loaded:
df = pd.read_csv('Employee.csv', parse_dates=['Date of Employement.'])
This is equivalent to the first approach with the exception that everything is done in one step.
Next you need to sort the datetime values in either ascending or descending order.
Ascending:
`df.sort_values('Date of Employment.')`
Descending
`df.sort_values('Date of Employment.',ascending=False)`
You need to convert Date of Employment. to a Date before sorting
df['Date of Employment.'] = pd.to_datetime(df['Date of Employment.'],format= '%d/%m/%Y')
Otherwise it's just strings for Python
I have a dataframe full of dates and I would like to select all dates where the month==12 and the day==25 and add replace the zero in the xmas column with a 1.
Anyway to do this? the second line of my code errors out.
df = DataFrame({'date':[datetime(2013,1,1).date() + timedelta(days=i) for i in range(0,365*2)], 'xmas':np.zeros(365*2)})
df[df['date'].month==12 and df['date'].day==25] = 1
Pandas Series with datetime now behaves differently. See .dt accessor.
This is how it should be done now:
df.loc[(df['date'].dt.day==25) & (cust_df['date'].dt.month==12), 'xmas'] = 1
Basically what you tried won't work as you need to use the & to compare arrays, additionally you need to use parentheses due to operator precedence. On top of this you should use loc to perform the indexing:
df.loc[(df['date'].month==12) & (df['date'].day==25), 'xmas'] = 1
An update was needed in reply to this question. As of today, there's a slight difference in how you extract months from datetime objects in a pd.Series.
So from the very start, incase you have a raw date column, first convert it to datetime objects by using a simple function:
import datetime as dt
def read_as_datetime(str_date):
# replace %Y-%m-%d with your own date format
return dt.datetime.strptime(str_date,'%Y-%m-%d')
then apply this function to your dates column and save results in a new column namely datetime:
df['datetime'] = df.dates.apply(read_as_datetime)
finally in order to extract dates by day and month, use the same piece of code that #Shayan RC explained, with this slight change; notice the dt.datetime after calling the datetime column:
df.loc[(df['datetime'].dt.datetime.month==12) &(df['datetime'].dt.datetime.day==25),'xmas'] =1