When querying rows in a dataframe based on a datcolumn value, it works when comparing just the year, but not for a date.
fil1.query('Date>2019')
This works fine. However when giving complete date, it fails.
fil1.query('Date>01-01-2019')
#fil1.query('Date>1-1-2019') # fails as well
TypeError: Invalid comparison between dtype=datetime64[ns] and int
What is the correct way to use dates on query function ? The docs doesnt seem to help.
There are two errors in your code. Date default format is yyyy-mm-dd, and you should use "" for values.
fil1.query('Date>"2019-01-01"')
Query on dates works for me
df = pd.DataFrame({'Dates': ['2022-01-01', '2022-01-02','2022-01-03','2022-01-04']})
df
Out[101]:
Dates
0 2022-01-01
1 2022-01-02
2 2022-01-03
3 2022-01-04
df.query('Dates>"2022-01-02"')
Out[102]:
Dates
2 2022-01-03
3 2022-01-04
If you filter only the year, this is probably an integer value, so this works:
fil1.query('year > 2019')
A full date-string must be filtered with quotation marks, e.g.
fil1.query('date > "2019-01-01"')
It's a bit like in SQL, there you also cannot filter like WHERE date > 1-1-2019, you need to do WHERE date > '2019-01-01'.
Related
I would like to create a YrWeek column in YYYY-WW format, e.g. 2022-01, based on a Date column in YYYY-MM-DD format. I would like to keep the YrWeek column in datetime format, so it will make my life easier when I plot it out.
Below are the steps that I tried
First convert the Date to datetime64
df['Date'] = pd.to_datetime(df.Date, format = %Y-%m-%d)
Then tried the following codes that I research here and there but still cannot figure out a way to create the YrWeek column in YYYY-WW in datetime64 format
df['YrWeek'] = df.Date.dt.to_period('M') #this show 2021-01-01 to 2021-01-06 in the column and in the plot later
df['YrWeek'] = pd.to_datetime(df.Date.apply(lambda x:'{0}-{1}'.format(x.year, x.isocalendar().week)), format='%Y-%w', errors='coerce') # which return "NAT" in the column
df['Yrweek'] = pd.to_datetime(df.Date.dt.year.astype(str) + '-' + df.Date.isocalendar().week.astype(str), format='%Y-%w') # this seems an unsuccessful operation
Thanks first for your helps. I am quite sure I've seen it somewhere, but unable to recall it or get my head round on this issue at the moment.
thanks.
From the documentation of datetime object,
A datetime object is a single object containing all the information from a date object and a time object.
It is not likely to export YYYY-WW in datetime64 format, but you may use strftime() documentation to create an explicit format string.
Here is the sample code:
import pandas as pd
df = pd.DataFrame({'date_time': pd.date_range(start='2022-01-01', end='2022-01-31')})
df['YrWeek'] = df['date_time'].dt.strftime('%Y-%W')
df.head(5) # print sample result
date_time YrWeek
0 2022-01-01 2022-00
1 2022-01-02 2022-00
2 2022-01-03 2022-01
3 2022-01-04 2022-01
4 2022-01-05 2022-01
I have a table with a Date column and several other country-specific columns (see the picture below). I want to create a heatmap in Seaborn but for that I need the Date column to be a datetime object. How can I change the dates from the current format -i.e. 2021Q3 - to 2021-09-01 (YYYY-MM-DD)?
I have tried the solution below (which works for monthly data - to_date = lambda x: pd.to_datetime(x['Date'], format='%YM%m')), but it does not work for the quarterly data. I get a ValueError: 'q' is a bad directive in format '%YQ%q'... I could not find any solution to the error online...
# loop to transform the Date column's format
to_date = lambda x: pd.to_datetime(x['Date'], format='%YQ%q')
df_eurostat_reg_bank_x = df_eurostat_reg_bank.assign(Date=to_date)
I have also tried this solution, but I get the first month of the quarter in return, whereas I want the last month of the quarter:
df_eurostat_reg_bank['Date'] = df_eurostat_reg_bank['Date'].str.replace(r'(\d+)(Q\d)', r'\1-\2')
df_eurostat_reg_bank['Date'] = pd.PeriodIndex(df_eurostat_reg_bank.Date, freq='Q').to_timestamp()
df_eurostat_reg_bank.Date = df_eurostat_reg_bank.Date.dt.strftime('%m/%d/%Y')
df_eurostat_reg_bank = df_eurostat_reg_bank.set_index('Date')
Thank you in advance!
I assume that your example of 2022Q3 is a string on the grounds that it's not a date format that I recognise.
Thus, simple arithmetic and f-strings will avoid the use of any external modules:
def convdate(d):
return f'{d[:4]}-{int(d[5]) * 3 - 2:02d}-01'
for d in ['2022Q1','2022Q2','2022Q3','2022Q4']:
print(convdate(d))
Output:
2022-01-01
2022-04-01
2022-07-01
2022-10-01
Note:
There is no attempt to ensure that the input string to convdate() is valid
When using pd.to_datetime on my data frame I get this error:
Out of bounds nanosecond timestamp: 30-04-18 00:00:00
Now from looking on StackO I know I can simply use the coerce option:
pd.to_datetime('13000101', format='%Y%m%d', errors='coerce')
But I was wondering if anyone had an idea on how I might replace these values with a fixed value? Say 1900-01-01 00:00:00 (or maybe 1955-11-12 for anyone who gets the reference!)
Reason being that this data frame is part of a process that handles thousands and thousands of JSONs per day. I want to be able to see in the dataset easily the incorrect ones by filtering for said fixed date.
It is just as invalid for the JSON to contain any date before 2010 so using an earlier date is fine and it is also perfectly acceptable to have a blank (NA) date value so I can't rely on just blanking the data.
Replace missing values by some default datetime value in Series.mask only for missing values generated by to_datetime with errors='coerce':
df=pd.DataFrame({"date": [np.nan,'20180101','20-20-0']})
t = pd.to_datetime('1900-01-01')
date = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
df['date'] = date.mask(date.isna() & df['date'].notna(), t)
print (df)
date
0 NaT
1 2018-01-01
2 1900-01-01
I have a dataframe with a date column:
data['Date']
0 1/1/14
1 1/8/14
2 1/15/14
3 1/22/14
4 1/29/14
...
255 11/21/18
256 11/28/18
257 12/5/18
258 12/12/18
259 12/19/18
But, when I try to get the max date out of that column, I get:
test_data.Date.max()
'9/9/15'
Any idea why this would happen?
Clearly the column is of type object. You should try using pd.to_datetime() and then performing the max() aggregator:
data['Date'] = pd.to_datetime(data['Date'],errors='coerce') #You might need to pass format
print(data['Date'].max())
The .max() understands it as a date (like you want), if it is a datetime object. Building upon Seshadri's response, try:
type(data['Date'][1])
If it is a datetime object, this returns this:
pandas._libs.tslibs.timestamps.Timestamp
If not, you can make that column a datatime object like so:
data['Date'] = pd.to_datetime(data['Date'],format='%m/%d/%y')
The format argument makes sure you get the right formatting. See the full list of formatting options here in the python docs.
Your date may be stored as a string. First convert the column from string to datetime. Then, max() should work.
test = pd.DataFrame(['1/1/2010', '2/1/2011', '3/4/2020'], columns=['Dates'])
Dates
0 1/1/2010
1 2/1/2011
2 3/4/2020
pd.to_datetime(test['Dates'], format='%m/%d/%Y').max()
Timestamp('2020-03-04 00:00:00')
That timestamp can be cleaned up using .dt.date:
pd.to_datetime(test['Dates'], format='%m/%d/%Y').dt.date.max()
datetime.date(2020, 3, 4)
to_datetime format argument table python docs
pandas to_datetime pandas docs
I am trying to convert a column in my dataframe to dates, which are meant to be birthdays. The data was manually captured over a period of years with different formats. I cant get Pandas to format the whole column correctly.
formats include:
YYYYMMDD
DDMMYYYY
DD/MM/YYYY
DD-MMM-YYYY (eg JAN)
I have tried
dates['BIRTH-DATE(MAIN)'] = pd.to_datetime(dates['BIRTH-DATE(MAIN)'])
but i get the error
ValueError: year 19670314 is out of range
Not sure how I can get it to include multiple date formats?
You could create your own function to handle this. For example, something like:
df = pd.DataFrame({'date': {0: '20180101', 1: '01022018', 2: '01/02/2018', 3: '01-JAN-2018'}})
def fix_date(series, patterns=['%Y%m%d', '%d%m%Y', '%d/%m/%Y', '%d-%b-%Y']):
datetimes = []
for pat in patterns:
datetimes.append(pd.to_datetime(series, format=pat, errors='coerce'))
return pd.concat(datetimes, axis=1).ffill(axis=1).iloc[:, -1]
df['fixed_dates'] = fix_date(df['date'])
[out]
print(df)
date fixed_dates
0 20180101 2018-01-01
1 01022018 2018-02-01
2 01/02/2018 2018-02-01
3 01-JAN-2018 2018-01-01
In my eyes pandas is really good in converting dates but it is nearly impossible to guess always the right format automatically. Use pd.to_datetime with the option errors='coerce' and check the dates which were not converted by hand.