I have DataFrame with values looks like this
Date Value
1 2020-04-12 A
2 2020-05-12 B
3 2020-07-12 C
4 2020-10-12 D
5 2020-11-12 E
and I need to create new DataFrame only with dates from today (7.12) to future (in this example only rows 3, 4 and 5).
I use this code:
df1= df[df["Date"] >= date.today()]
but it gives me TypeError: Invalid comparison between dtype=datetime64[ns] and date
What am I doing wrong? Thank you!
Use the .dt.date on the df['Date'] column. Then you are comparing dates with dates. So:
df1 = df.loc[df['Date'].dt.date >= date.today()]
This will give you:
Date Value
3 2020-12-07 C
4 2020-12-10 D
5 2020-12-11 E
Also make sure that your dateformat is actualy correct. For example by print df['Date'].dt.month to see that it gives all 12's. If not, your date string is not converted correctly. In that case, use df['Date'] = pd.to_datetime(df['Date'], format="%Y-%d-%m") to convert the Date column to the correct datetime format after creating the DataFrame.
Could you please try following. This considers that your dates are in YYYY-DD-MM format, in case its other format then one could change date format accordingly in strftime function.
import pandas as pd
today=pd.datetime.today().strftime("%Y-%d-%m")
df.loc[df['Date'] >= today]
Sample run of solution above: Let's say we have following test DataFrame.
Date Value
1 2020-04-12 A
2 2020-05-12 B
3 2020-07-12 C
4 2020-11-12 D
5 2020-12-12 E
Now when we run the solution above we will get following output:
Date Value
3 2020-07-12 C
4 2020-11-12 D
5 2020-12-12 E
Related
I need to convert a df with a data column of integers and convert this to the following format in the current year: YYYY-MM-DD HH:MM:SS. I have a DF that looks like this:
Date LT Mean
0 7 5.491916
1 8 5.596823
2 9 5.793934
3 10 7.501096
4 11 8.152358
5 12 8.426322
And, I need it to look like this using the current year 2020:
Date LT Mean
0 2020-07-01 5.491916
1 2020-08-01 5.596823
2 2020-09-01 5.793934
3 2020-10-01 7.501096
4 2020-11-01 8.152358
5 2020-12-01 8.426322
I have not been able to find a reference for converting a single integer used for the date and converting it into the yyyy-mm-dd hh:mm:ss format i need. Thank you,
You can use pandas to_datetime function. Assuming your Date column represents the month, you can use like this:
df['Date'] = pandas.to_datetime(df["Date"], format='%m').apply(lambda dt: dt.replace(year=2020))
Then if you need transform the column to string in the specified format:
df['Date'] = df['Date'].dt.strftime('%Y-%m-%d %H:%m:%s')
I am trying to convert a datetime object to datetime. In the original dataframe the data type is a string and the dataset has shape = (28000000, 26). Importantly, the format of the date is MMYYYY only. Here's a data sample:
DATE
Out[3] 0 081972
1 051967
2 101964
3 041975
4 071976
I tried:
df['DATE'].apply(pd.to_datetime(format='%m%Y'))
and
pd.to_datetime(df['DATE'],format='%m%Y')
I got Runtime Error both times
Then
df['DATE'].apply(pd.to_datetime)
it worked for the other not shown columns(with DDMMYYYY format), but generated future dates with df['DATE'] because it reads the dates as MMDDYY instead of MMYYYY.
DATE
0 1972-08-19
1 2067-05-19
2 2064-10-19
3 1975-04-19
4 1976-07-19
Expect output:
DATE
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 1976-07
If this question is a duplicate please direct me to the original one, I wasn't able to find any suitable answer.
Thank you all in advance for your help
First if error is raised obviously some datetimes not match, you can test it by errors='coerce' parameter and Series.isna, because for not matched values are returned missing values:
print (df)
DATE
0 81972
1 51967
2 101964
3 41975
4 171976 <-changed data
print (pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce'))
0 1972-08-01
1 1967-05-01
2 1964-10-01
3 1975-04-01
4 NaT
Name: DATE, dtype: datetime64[ns]
print (df[pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').isna()])
DATE
4 171976
Solution with output from changed data with converting to datetimes and the to months periods by Series.dt.to_period:
df['DATE'] = pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').dt.to_period('m')
print (df)
DATE
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 NaT
Solution with original data:
df['DATE'] = pd.to_datetime(df['DATE'],format='%m%Y', errors='coerce').dt.to_period('m')
print (df)
0 1972-08
1 1967-05
2 1964-10
3 1975-04
4 1976-07
I would have done:
df['date_formatted'] = pd.to_datetime(
dict(
year=df['DATE'].str[2:],
month=df['DATE'].str[:2],
day=1
)
)
Maybe this helps. Works for your sample data.
I 've got stuck with the following format:
0 2001-12-25
1 2002-9-27
2 2001-2-24
3 2001-5-3
4 200510
5 20078
What I need is the date in a format %Y-%m
What I tried was
def parse(date):
if len(date)<=5:
return "{}-{}".format(date[:4], date[4:5], date[5:])
else:
pass
df['Date']= parse(df['Date'])
However, I only succeeded in parse 20078 to 2007-8, the format like 2001-12-25 appeared as None.
So, how can I do it? Thank you!
we can use the pd.to_datetime and use errors='coerce' to parse the dates in steps.
assuming your column is called date
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
df['date_fixed'] = s
print(df)
date date_fixed
0 2001-12-25 2001-12-25
1 2002-9-27 2002-09-27
2 2001-2-24 2001-02-24
3 2001-5-3 2001-05-03
4 200510 2005-10-01
5 20078 2007-08-01
In steps,
first we cast the regular datetimes to a new series called s
s = pd.to_datetime(df['date'],errors='coerce',format='%Y-%m-%d')
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 NaT
5 NaT
Name: date, dtype: datetime64[ns]
as you can can see we have two NaT which are null datetime values in our series, these correspond with your datetimes which are missing a day,
we then reapply the same datetime method but with the opposite format, and apply those to the missing values of s
s = s.fillna(pd.to_datetime(df['date'],format='%Y%m',errors='coerce'))
print(s)
0 2001-12-25
1 2002-09-27
2 2001-02-24
3 2001-05-03
4 2005-10-01
5 2007-08-01
then we re-assign to your dataframe.
You could use a regex to pull out the year and month, and convert to datetime :
df = pd.read_clipboard("\s{2,}",header=None,names=["Dates"])
pattern = r"(?P<Year>\d{4})[-]*(?P<Month>\d{1,2})"
df['Dates'] = pd.to_datetime([f"{year}-{month}" for year, month in df.Dates.str.extract(pattern).to_numpy()])
print(df)
Dates
0 2001-12-01
1 2002-09-01
2 2001-02-01
3 2001-05-01
4 2005-10-01
5 2007-08-01
Note that pandas automatically converts the day to 1, since only year and month was supplied.
I have a dataframe (df) that looks like:
DATES
0 NaT
1 01/08/2003
2 NaT
3 NaT
4 04/08/2003
5 NaT
6 30/06/2003
7 01/03/2004
8 18/05/2003
9 NaT
10 NaT
11 31/10/2003
12 NaT
13 NaT
I am struggling to find out how I transform the data-frame to remove the NaT values so the final output looks like
DATES
0
1 01/08/2003
2
3
4 04/08/2003
5
6 30/06/2003
7 01/03/2004
8 18/05/2003
9
10
11 31/10/2003
12
13
I have tried :
df["DATES"].fillna("", inplace = True)
but with no success.
For information the column is in a datatime format set with
df["DATES"] = pd.to_datetime(df["DATES"],errors='coerce').dt.strftime('%d/%m/%Y')
What can I do to resolve this?
There is problem NaT are strings, so need:
df["DATES"] = df["DATES"].replace('NaT', '')
df.fillna() works on numpy.NaN values. Your "Nat" are probably strings. So you can do following,
if you want to use fillna()
df["DATES"].replace("NaT",np.NaN, inplace=True)
df.fillna("", inplace=True)
Else, you can just replace with your desired string
df["DATES"].replace("NaT","", inplace=True)
Convert column to object and then use Series.where:
df['Dates'] = df['Dates'].astype(object).where(df['Date'].notnull(),np.nan)
Or whatever you want np.nan to be
Your conversion to datetime did not work properly on the NaTs.
You can check this before calling the fillna by printing out df['DATES'][0] and seeing that you get a 'NaT' (string) and not NaT (your wanted format)
Instead, use (for example): df['DATES'] = df['DATES'].apply(pd.Timestamp)
This example worked for me as is, but notice that it's not datetime but rather pd.Timestamp (it's another time format, but it's an easy one to use). You do not need to specify your time format with this, your current format is understood by pd.Timestamp.
All variables are object dtype. I'd like to convert the dtype from object to datetime.
df = pd.DataFrame(
{"ID":['A','B','C','D','E'],
"date":['4/12/2017','4/27/2017','4/28/2017','4/29/2017','4/210/2017'],
})
What I've 3 different approaches. First,
df['date'] = pd.to_datetime(df['date'], format="%m-%d-%Y")
This didn't work out, giving me value error saying "time data '4/12/2017' doesn't match format specified"
Second,
parse(exerciseCsv['mostRecentExerciseDate'], dayfirst=True)
Third,
[datetime.strptime(x, '%m/%d/%Y') for x in exerciseCsv['mostRecentExerciseDate']]
All of above didn't work out. This looks like a simple task, can anyone help me how to get this done and explain me why this isn't working?
I think you need add parameter errors='coerce' if possible some bad data, which are converted to NaT (NaN for datetime):
df['date'] = pd.to_datetime(df['date'], errors='coerce')
print (df)
ID date
0 A 2017-04-12
1 B 2017-04-27
2 C 2017-04-28
3 D 2017-04-29
4 E NaT
If only typo in last value:
df = pd.DataFrame(
{"ID":['A','B','C','D','E'],
"date":['4/12/2017','4/27/2017','4/28/2017','4/29/2017','4/21/2017'],
})
df['date'] = pd.to_datetime(df['date'])
print (df)
ID date
0 A 2017-04-12
1 B 2017-04-27
2 C 2017-04-28
3 D 2017-04-29
4 E 2017-04-21
As mentioned Stephen Rauch in comment need pattern change by data - add / instead -:
df['date'] = pd.to_datetime(df['date'], format="%m/%d/%Y")
print (df)
ID date
0 A 2017-04-12
1 B 2017-04-27
2 C 2017-04-28
3 D 2017-04-29
4 E 2017-04-21