Sort pandas dataframe by date or column - python

The solutions I have found in a similar question are not working for me. I have a pandas DataFrame including mock sales data. I want to sort by date since they are currently out of order. I have tried converting to a datetime object. I also tried creating a Month and Day column and sorting by them but that did not work either. Date is in YYYY-MM-DD format
Here is my solution:
import pandas as pd
import datetime
data = pd.read_csv(path)
# sort by date (not working)
data['OrderDate'] = pd.to_datetime(data['OrderDate'])
data.sort_values(by='OrderDate')
data.reset_index(inplace=True)
# sort by month then day (not working)
data.sort_values(by='Month')
data.sort_values(by='Day')
data.reset_index(inplace=True)
# export csv
data.to_csv(fileName, index=False)

Related

Why does pd.to_datetime not take the year into account?

I've searched for 2 hours but can't find an answer for this that works.
I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
Here's a snippet from my code
import pandas as pd
df=pd.read_csv('test.csv')
df['Date'] = pd.to_datetime(df['Date'])
st.write(df['Date'].max())
st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.
I tried changing the format to
df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.
pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
then
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())
gives output
2023-01-09 00:00:00
Explanation: I provide list of names of columns holding dates, which then read_csv parses.
(tested in pandas 1.5.2)

How can I add a new column to a dataframe that adds to the dates in another column?

I want to automatically get rows with dates that are 90 days away from expiration and send me an email with the rows of what is expiring.
import pandas as pd
import numpy as np
from datetime import date
today = date.today()
fromtoday = pd.DateOffset(days=89)
90days_away = today + fromtoday
expiration date column in my dataframe:
expirations = df[df['Ad Expiration'].notnull()]
What I'm trying to do now is create a column that sums my expirations column with 90days_away
I think I somehow need to apply the 90days_away to all rows? but i can't do that manually.
Also worth noting that I've only been studying python for about a week and a half, so I still don't know the best way to do things. Thank you!

How to find the last monday's data only from a dataframe in python?

I have a dataframe that contains 1 years of weekly OHLC data.
What do I need ?
list only the last monday's data of each month. For example, May has 5 weeks and I want to list the last monday's data of may and need to discard the rest. Here's the code that I tried and I'm able to list the data on weekly basis. I got stuck here!
Any help would be appreciated!
import pandas as pd
import yfinance as yf
import datetime
from datetime import date, timedelta
periods=pd.date_range(start='2021-4-30',periods=60,freq='W')
start = periods[0].strftime('%Y-%m-%d')
end = periods[-1].strftime('%Y-%m-%d')
symbol="^NSEI"
df=yf.download(symbol,start,end,interval="1wk",index=periods)
You can use groupby(pd.Grouper()) to group by month and get the latest record.
# reset index to flatten columns
df = df.reset_index()
# copy date column to label last monday of a month
df['last_monday_of_month'] = df['Date']
# groupby month and get latest record
df.groupby(pd.Grouper(freq='M', key='Date')).last().reset_index()

Change date '01-Sept-20' to '01-Sep-20' using pandas dataframe

I have a huge .csv file with date as one of the column and I'm trying to plot it on a graph but I'm getting this error
"time data '01-Sept-20' does not match format '%d-%b-%y' (match)"
I'm using this line of code to convert it into datetime format
df['Date'] = pd.to_datetime(df['Date'], format="%d-%b-%y")
I think this error is because 'Sept' should be 'Sep'
What can I do to make Sept to Sep?
I'm using this dataset: covid19 api
As #Mayank pointed out in the comment you could replace the "Sept" string. And it works.
However, in your dataset is a column named Date_YMD which will give you the date without string replacement.
A complete example:
import pandas as pd
df = pd.read_csv('covid.csv')
df['Date_YMD'] = pd.to_datetime(df['Date_YMD'])
df['Date'] = pd.to_datetime(df['Date'].str.replace('Sept', 'Sep'), format='%d-%b-%y')
I think the main point here is to familiarize yourself with the data before searching for a technical solution.

Importing excel data with pandas showing date-time despite being date value

I've just started using pandas and I'm trying to import an excel file but I get Date-Time values like 01/01/2019 00:00:00 instead of the 01/01/2019 format. The source data is Date by the way, not Date-Time.
I'm using the following code
import pandas as pd
df = pd.read_excel (r'C:\Users\abcd\Desktop\KumulData.xlsx')
print(df)
The columns that have date in them are "BDATE", "BVADE" and "AKTIVASYONTARIH" which correspond to 6th, 7th and 11th columns.
What code can I use to see the dates as Date format in Pandas Dataframe?
Thanks.
If they're already datetimes then you can extract the date part and reassign the columns:
df[["BDATE", "BVADE", "AKTIVASYONTARIH"]] = df[["BDATE", "BVADE", "AKTIVASYONTARIH"]].apply(lambda x: x.dt.date)
solution updated..
For the sake of completeness, your goal can be achieved by:
df[["BDATE", "BVADE", "AKTIVASYONTARIH"]].astype("datetime64[D]")

Categories