Why does pd.to_datetime not take the year into account? - python

I've searched for 2 hours but can't find an answer for this that works.
I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
Here's a snippet from my code
import pandas as pd
df=pd.read_csv('test.csv')
df['Date'] = pd.to_datetime(df['Date'])
st.write(df['Date'].max())
st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.
I tried changing the format to
df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.

pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
then
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())
gives output
2023-01-09 00:00:00
Explanation: I provide list of names of columns holding dates, which then read_csv parses.
(tested in pandas 1.5.2)

Related

How can I change the Date column format in pandas?

I need to convert the date to Day, Month and Year. I tried some alternatives, but I was unsuccessful.
import pandas as pd
df = pd.read_excel(r"C:\__Imagens e Planilhas Python\Instagram\Postagem.xlsx")
print(df)
It's very confusing, because you're using two different formats between the image and the expected result (and you write you want the same).
Clarify that data is a date with:
df['data']= = pd.to_datetime(df['data'])
Once you have this, just change the format with:
my_format = '%m-%d-%Y'
df['data'] = df['data'].dt.strftime(my_format)

Sort pandas dataframe by date or column

The solutions I have found in a similar question are not working for me. I have a pandas DataFrame including mock sales data. I want to sort by date since they are currently out of order. I have tried converting to a datetime object. I also tried creating a Month and Day column and sorting by them but that did not work either. Date is in YYYY-MM-DD format
Here is my solution:
import pandas as pd
import datetime
data = pd.read_csv(path)
# sort by date (not working)
data['OrderDate'] = pd.to_datetime(data['OrderDate'])
data.sort_values(by='OrderDate')
data.reset_index(inplace=True)
# sort by month then day (not working)
data.sort_values(by='Month')
data.sort_values(by='Day')
data.reset_index(inplace=True)
# export csv
data.to_csv(fileName, index=False)

How to find missing dates in an excel file by python

I'm a beginner in python. I have an excel file. This file shows the rainfall amount between 2016-1-1 and 2020-6-30. It has 2 columns. The first column is date, another column is rainfall. Some dates are missed in the file (The rainfall didn't estimate). For example there isn't a row for 2016-05-05 in my file. This a sample of my excel file.
Date rainfall (mm)
1/1/2016 10
1/2/2016 5
.
.
.
12/30/2020 0
I want to find the missing dates but my code doesn't work correctly!
import pandas as pd
from datetime import datetime, timedelta
from matplotlib import dates as mpl_dates
from matplotlib.dates import date2num
df=pd.read_excel ('rainfall.xlsx')
a= pd.date_range(start = '2016-01-01', end = '2020-06-30' ).difference(df.index)
print(a)
Here' a beginner friendly way of doing it.
First you need to make sure, that the Date in your dataframe is really a date and not a string or object.
Type (or print) df.info().
The date column should show up as datetime64[ns]
If not, df['Date'] = pd.to_datetime(df['Date'], dayfirst=False)fixes that. (Use dayfirst to tell if the month is first or the day is first in your date string because Pandas doesn't know. Month first is the default, if you forget, so it would work without...)
For the tasks of finding missing days, there's many ways to solve it. Here's one.
Turn all dates into a series
all_dates = pd.Series(pd.date_range(start = '2016-01-01', end = '2020-06-30' ))
Then print all dates from that series which are not in your dataframe "Date" column. The ~ sign means "not".
print(all_dates[~all_dates.isin(df['Date'])])
Try:
df = pd.read_excel('rainfall.xlsx', usecols=[0])
a = pd.date_range(start = '2016-01-01', end = '2020-06-30').difference([l[0] for l in df.values])
print(a)
And the date in the file must like 2016/1/1
To find the missing dates from a list, you can apply Conditional Formatting function in Excel. 4. Click OK > OK, then the position of the missing dates are highlighted. Note: The last date in the date list will be highlighted.
this TRICK Is not with python,a NORMAL Trick

Change date '01-Sept-20' to '01-Sep-20' using pandas dataframe

I have a huge .csv file with date as one of the column and I'm trying to plot it on a graph but I'm getting this error
"time data '01-Sept-20' does not match format '%d-%b-%y' (match)"
I'm using this line of code to convert it into datetime format
df['Date'] = pd.to_datetime(df['Date'], format="%d-%b-%y")
I think this error is because 'Sept' should be 'Sep'
What can I do to make Sept to Sep?
I'm using this dataset: covid19 api
As #Mayank pointed out in the comment you could replace the "Sept" string. And it works.
However, in your dataset is a column named Date_YMD which will give you the date without string replacement.
A complete example:
import pandas as pd
df = pd.read_csv('covid.csv')
df['Date_YMD'] = pd.to_datetime(df['Date_YMD'])
df['Date'] = pd.to_datetime(df['Date'].str.replace('Sept', 'Sep'), format='%d-%b-%y')
I think the main point here is to familiarize yourself with the data before searching for a technical solution.

Importing excel data with pandas showing date-time despite being date value

I've just started using pandas and I'm trying to import an excel file but I get Date-Time values like 01/01/2019 00:00:00 instead of the 01/01/2019 format. The source data is Date by the way, not Date-Time.
I'm using the following code
import pandas as pd
df = pd.read_excel (r'C:\Users\abcd\Desktop\KumulData.xlsx')
print(df)
The columns that have date in them are "BDATE", "BVADE" and "AKTIVASYONTARIH" which correspond to 6th, 7th and 11th columns.
What code can I use to see the dates as Date format in Pandas Dataframe?
Thanks.
If they're already datetimes then you can extract the date part and reassign the columns:
df[["BDATE", "BVADE", "AKTIVASYONTARIH"]] = df[["BDATE", "BVADE", "AKTIVASYONTARIH"]].apply(lambda x: x.dt.date)
solution updated..
For the sake of completeness, your goal can be achieved by:
df[["BDATE", "BVADE", "AKTIVASYONTARIH"]].astype("datetime64[D]")

Categories