How can I change the Date column format in pandas? - python

I need to convert the date to Day, Month and Year. I tried some alternatives, but I was unsuccessful.
import pandas as pd
df = pd.read_excel(r"C:\__Imagens e Planilhas Python\Instagram\Postagem.xlsx")
print(df)

It's very confusing, because you're using two different formats between the image and the expected result (and you write you want the same).
Clarify that data is a date with:
df['data']= = pd.to_datetime(df['data'])
Once you have this, just change the format with:
my_format = '%m-%d-%Y'
df['data'] = df['data'].dt.strftime(my_format)

Related

Why does pd.to_datetime not take the year into account?

I've searched for 2 hours but can't find an answer for this that works.
I have this dataset I'm working with and I'm trying to find the latest date, but it seems like my code is not taking the year into account. Here are some of the dates that I have in the dataset.
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
Here's a snippet from my code
import pandas as pd
df=pd.read_csv('test.csv')
df['Date'] = pd.to_datetime(df['Date'])
st.write(df['Date'].max())
st.write gives me 12/21/2022 as the output instead of 01/09/2023 as it should be. So it seems like the code is not taking the year into account and just looking at the month and date.
I tried changing the format to
df['Date'] = df['Date'].dt.strftime('%Y%m%d').astype(int) but that didn't change anything.
pandas.read_csv allows you to designate column for conversion into dates, let test.csv content be
Date
01/09/2023
12/21/2022
12/09/2022
11/19/2022
then
import pandas as pd
df = pd.read_csv('test.csv', parse_dates=["Date"])
print(df['Date'].max())
gives output
2023-01-09 00:00:00
Explanation: I provide list of names of columns holding dates, which then read_csv parses.
(tested in pandas 1.5.2)

Dataframe date sorting is reversed. How to fix it?

So, I have a dataframe (mean_df) with a very messy column with dates. It's messy because it is in this format: 1/1/2018, 1/2/2018, 1/3/2018.... When it should be 01/01/2018, 02/01/2018, 03/01/2018... Not only has the wrong format, but it's ascending by the first day of every month, and then following second day of every month, and so on...
So I wrote this code to fix the format:
mean_df["Date"] = mean_df["Date"].astype('datetime64[ns]')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')
Then, from displaying this:
It's now showing this (I have to run the same cell 3 times to make it work, it always throws error the first time):
Finally, in the last few hours I've been trying to sort the 'Dates' column, in an ascending way, but it keeps sorting it the wrong way:
mean_df = mean_df.sort_values(by='Date') # I tried this
But this is the output:
As you can see, it is still ascending prioritizing days.
Can someone guide me in the right direction?
Thank you in advance!
Make it into right format
mean_df["sort_date"] = pd.to_datetime(mean_df["Date"],format = '%d/%m/%Y')
mean_df = mean_df.sort_values(by='sort_date') # Try this now
You should sort the date just after convert it to datetime since dt.strftime convert datetime to string
mean_df["Date"] = pd.to_datetime(mean_df["Date"], dayfirst=True)
mean_df = mean_df.sort_values(by='Date')
mean_df["Date"] = mean_df["Date"].dt.strftime('%d-%m-%Y')
Here is my sample code.
import pandas as pd
df = pd.DataFrame()
df['Date'] = "1/1/2018, 1/2/2018, 1/3/2018".split(", ")
df['Date1'] = pd.to_datetime(df['Date'], format='%d/%m/%Y')
df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')
df.sort_values(by='Date2')
First, I convert Date to datetime format. As I observed, you data follows '%d/%m/%Y' format. If you want to show data in another form, try the following line, for example
df['Date2'] = df['Date1'].dt.strftime('%d/%m/%Y')

Change date '01-Sept-20' to '01-Sep-20' using pandas dataframe

I have a huge .csv file with date as one of the column and I'm trying to plot it on a graph but I'm getting this error
"time data '01-Sept-20' does not match format '%d-%b-%y' (match)"
I'm using this line of code to convert it into datetime format
df['Date'] = pd.to_datetime(df['Date'], format="%d-%b-%y")
I think this error is because 'Sept' should be 'Sep'
What can I do to make Sept to Sep?
I'm using this dataset: covid19 api
As #Mayank pointed out in the comment you could replace the "Sept" string. And it works.
However, in your dataset is a column named Date_YMD which will give you the date without string replacement.
A complete example:
import pandas as pd
df = pd.read_csv('covid.csv')
df['Date_YMD'] = pd.to_datetime(df['Date_YMD'])
df['Date'] = pd.to_datetime(df['Date'].str.replace('Sept', 'Sep'), format='%d-%b-%y')
I think the main point here is to familiarize yourself with the data before searching for a technical solution.

Converting a pandas datframe column to date type with a particular format of date?

This question is different from all the available questions and answers available in stack overflow because I do not want to change my data type to string in order to obtain desired output.
I find it as a most confusing and not able to find proper solution of my problem.
I read an excel file which have one column as following-
Date
9/20/2017 7:27:30 PM
9/20/2017 7:27:30 PM
11/21/2018 8:28:30 AM
7/18/2019 9:30:08 PM
.
.
.
I am taking this data from excel sheet with the help of dataframe
df = pd.read_excel("data.xlsx")
Firstly I want to remove time from this column. I am doing it as -
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = pd.to_datetime(df['Date'], errors='ignore', format='%d/%b/%Y').dt.date
It produces following output and datatype as datetime.date
Date
20/9/2017
20/9/2017
21/11/2018
18/7/2019
.
.
.
But I want it as following type without changing it into string.Because I want to store this data into another excel file and this column must behave as a date column if we apply filtering in my excel sheet.
Date
20/Sep/2017
20/Sep/2017
21/Nov/2018
18/Jul/2019
.
.
.
I can produce above output by
df['Date'] = df['Date'].apply(lambda x: x.strftime('%d/%b/%Y'))
But again this date column will be changed into string.But I do not want it as string. I want it as datetime type excluding time values from each cell.
A possible solution after converting it from string to datetime is as following but it will again add time values in it-
df['Date'] = pd.to_datetime(df['Date'])
After executing above two steps it will also include time as 12:00:00 AM or 00:00:00 AM along with date value.
Hope I am clear.
How to obtained the desired result with final column value as date type
But I want it as following type without changing it into string
No it is not possible, if want datetimes without times there is only pattern YYYY-MM-DD in python/pandas.
#datetimes with no times
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.floor('d')
#python dates
df['Date'] = pd.to_datetime(df['Date'], format='%m/%d/%Y %I:%M:%S %p').dt.date
For all custom formats are datetimes converted to strings like:
df['Date'] = df['Date'].dt.strftime('%d/%b/%Y')
You can set the date_format in the excelwriter
writer = pd.ExcelWriter("pandas_datetime.xlsx",
engine='xlsxwriter',
date_format='%d/%b/%Y')
df.to_excel(writer)
think i am bit late here, as a workaround
do not format the date column , let it be a regular df date column, save the excel workbook and then open the excel again and using openpyxl module format that column range
import openpyxl
workbook = openpyxl.load_workbook(file_path)
sheet = workbook['Sheet1'] # get the active sheet
#-- assuming that the column is M and data starts from M2
last_line_end = 'M' + str(len(df)+1)
for row in sheet['M2:' + last_line_end]:
for cell in row:
cell.number_format = "DD/MM/YY"
workbook.save(file_name) # save workbook
workbook.close()

Convert a date to a different format for an entire new column

I want to convert the date in a column in a dataframe to a different format. Currently, it has this format: '2019-11-20T01:04:18'. I want it to have this format: 20-11-19 1:04.
I think I need to develop a loop and generate a new column for the new date format. So essentially, in the loop, I would refer to the initial column and then generate the variable for the new column in the format I want.
Can someone help me out to complete this task?
The following code works for one occasion:
import datetime
d = datetime.datetime.strptime('2019-11-20T01:04:18', '%Y-%m-%dT%H:%M:%S')
print d.strftime('%d-%m-%y %H:%M')
From a previous answer in this site , this should be able to help you, comments give explanation
You can read your data into pandas from csv or database or create some test data as shown below for testing.
>>> import pandas as pd
>>> df = pd.DataFrame({'column': {0: '26/1/2016', 1: '26/1/2016'}})
>>> # First convert the column to datetime datatype
>>> df['column'] = pd.to_datetime(df.column)
>>> # Then call the datetime object format() method, set the modifiers you want here
>>> df['column'] = df['column'].dt.strftime('%Y-%m-%dT%H:%M:%S')
>>> df
column
0 2016-01-26T00:00:00
1 2016-01-26T00:00:00
NB. Check to ensure that all your columns have similar date strings
You can either achieve it like this:
from datetime import datetime
df['your_column'] = df['your_column'].apply(lambda x: datetime.strptime(x, '%Y-%m-%dT%H:%M:%S').strftime('%d-%m-%y %H:%M'))

Categories