I have a date column in pandas like the following as string datatype:
df['Date]
Dec/2018
Mar/2017
Sep/2019
I want the date column in pandas like the following as datetime datatype,
not string datatype:
df['Date]
`December-2018`
`March-2017`
`September-2019`
from datetime import datetime
import pandas as pd
# define sample data to create a dataframe from
data = {'Date': ['Dec/2018', 'Mar/2017', 'Sep/2019']}
df = pd.DataFrame(data)
# define a function to convert dates
def format_date(el):
return datetime.strptime(el, '%b/%Y').strftime('%B-%Y')
# apply conversion to desired column and store output in new column
df['Date_formatted'] = df['Date'].apply(format_date)
# print the dataframe to check result
print(df)
Which will output the dataframe as this:
Date Date_formatted
0 Dec/2018 December-2018
1 Mar/2017 March-2017
2 Sep/2019 September-2019
The pandas approach using pd.to_datetime:
df['Date'] = pd.to_datetime(df['Date'], format='%b/%Y').dt.strftime('%B-%Y')
0 December-2018
1 March-2017
2 September-2019
Name: Date, dtype: object
import datetime
x = datetime.date.today()
print(x.strftime("%B - %Y "))
For additional information, you can read this.
I hope it will help you.
Related
I have a date in format of YYYY-MM-DD (2022-11-01). I want to convert it to 'YYYYMMDD' format (without hyphen). Pls support.
I tried this...
df['ConvertedDate']= df['DateOfBirth'].dt.strftime('%m/%d/%Y')... but no luck
If I understand correctly, the format mask you should be using with strftime is %Y%m%d:
df["ConvertedDate"] = df["DateOfBirth"].dt.strftime('%Y%m%d')
Pandas itself providing the ability to convert strings to datetime in Pandas dataFrame with desire format.
df['ConvertedDate'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d').dt.strftime('%Y%m%d')
Referenced Example:
import pandas as pd
values = {'DateOfBirth': ['2021-01-14', '2022-11-01', '2022-11-01']}
df = pd.DataFrame(values)
df['ConvertedDate'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d').dt.strftime('%Y%m%d')
print (df)
Output:
DateOfBirth ConvertedDate
0 2021-01-14 20210114
1 2022-11-01 20221101
2 2022-11-01 20221101
This works
from datetime import datetime
initial = "2022-11-01"
time = datetime.strptime(initial, "%Y-%m-%d")
print(time.strftime("%Y%m%d"))
I have a dataframe that contain arrival dates for vessels and I'd want to make python recognize the current year and month that we are at the moment and remove all entries that are prior to the current month and year.
I have a column with the date itself in the format '%d/%b/%Y' and columns for month and year separatly if needed.
For instance, if today is 01/01/2022. I'd like to remove everything that is from dec/2021 and prior.
Using pandas periods and boolean indexing:
# set up example
df = pd.DataFrame({'date': ['01/01/2022', '08/02/2022', '09/03/2022'], 'other_col': list('ABC')})
# find dates equal or greater to this month
keep = (pd.to_datetime(df['date'], dayfirst=False)
.dt.to_period('M')
.ge(pd.Timestamp('today').to_period('M'))
)
# filter
out = df[keep]
Output:
date other_col
1 08/02/2022 B
2 09/03/2022 C
from datetime import datetime
import pandas as pd
df = ...
# assuming your date column is named 'date'
t = datetime.utcnow()
df = df[pd.to_datetime(df.date) >= datetime(t.year, t.month, t.day)]
Let us consider this example dataframe:
import pandas as pd
import datetime
df = pd.DataFrame()
data = [['nao victoria', '21/Feb/2012'], ['argo', '6/Jun/2022'], ['kon tiki', '23/Aug/2022']]
df = pd.DataFrame(data, columns=['Vessel', 'Date'])
You can convert your dates to datetimes, by using pandas' to_datetime method; for instance, you may save the output into a new Series (column):
df['Datetime']=pd.to_datetime(df['Date'], format='%d/%b/%Y')
You end up with the following dataframe:
Vessel Date Datetime
0 nao victoria 21/Feb/2012 2012-02-21
1 argo 6/Jun/2022 2022-06-06
2 kon tiki 23/Aug/2022 2022-08-23
You can then reject rows containing datetime values that are smaller than today's date, defined using datetime's now method:
df = df[df.Datetime > datetime.datetime.now()]
This returns:
Vessel Date Datetime
2 kon tiki 23/Aug/2022 2022-08-23
My CSV data looks like this -
Date Time
1/12/2019 12:04AM
1/12/2019 12:09AM
1/12/2019 12:14AM
and so on
And I am trying to read this file using pandas in the following way -
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv',parse_dates=[['Date','Time']])
print(data['Date_Time'].dt.month)
When I try to access the year through the dt accessor the year prints out fine as 2019.
But when I try to print the day or the month it is completely incorrect. In the case of month it starts off as 1 and ends up as 12 when the right value should be 12 all the time.
With the day it starts off as 12 and ends up at 31 when it should start at 1 and end in 31. The file has total of 8867 entries. Where am I going wrong ?
The default format is MM/DD, while yours is DD/MM.
The simplest solution is to set the dayfirst parameter of read_csv:
dayfirst : DD/MM format dates, international and European format (default False)
data = pd.read_csv('D 2019.csv', parse_dates=[['Date', 'Time']], dayfirst=True)
# -------------
>>> data['Date_Time'].dt.month
# 0 12
# 1 12
# 2 12
# Name: Date_Time, dtype: int64
Try assigning format argument of pd.to_datetime
df = pd.read_csv('D 2019.csv')
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format='%d/%m/%Y %H:%M%p')
You need to check the data type of your dataframe and convert the column "Date" into datetime
df["Date"] = pd.to_datetime(df["Date"])
After you can access the day, month, or year using:
dt.day
dt.month
dt.year
Note: Make sure the format of the date (D/M/Y or M/D/Y)
Full Code
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv')
data["Date"] = pd.to_datetime(data["Date"])
print(data["Date"].dt.day)
print(data["Date"].dt.month)
print(data["Date"].dt.year)
I am working with parquet and I need to use date32[day] objects for my dates but I am unclear how to use pandas to generate this exact datatype, rather than a timestamp.
Consider this example:
from datetime import datetime, date
import pyarrow.parquet as pq
import pandas as pd
df1 = pd.DataFrame({'date': [date.today()]})
df1.to_parquet('testdates.parquet')
pq.read_table("testdates.parquet") # date32[day]
# pandas version
df2 = pd.DataFrame({'date': [pd.to_datetime('2022-04-07')]})
df2.to_parquet('testdates2.parquet')
pq.read_table("testdates2.parquet") # timestamp[us]
From pandas integraton with pyarrow here
import pyarrow as pa
from datetime import date
df2 = pd.Series({'date':[date(2022,4,7)]})
df2_dat32 = pa.array(df2)
print("dataframe:", df2)
print("value of dataframe:", df2_dat32[0])
print("datatype:", df2_dat32.type)
Output
dataframe: date [2022-04-07]
dtype: object
value of dataframe: [datetime.date(2022, 4, 7)]
datatype: list<item: date32[day]>
Edit: If you have entire column of dates, you will need to first convert datetime to date and then use same method as above. See example below:
import pyarrow as pa
from datetime import date
#create pandas DataFrame with one column with five
#datetime values through a dictionary
datetime_df = pd.DataFrame({'DateTime': ['2021-01-15 20:02:11',
'1989-05-24 20:34:11',
'2020-01-18 14:43:24',
'2021-01-15 20:02:10',
'1999-04-04 20:34:11']})
datetime_df['Date'] = pd.to_datetime(datetime_df['DateTime']).dt.date
date_series = pd.Series(datetime_df['Date'])
print(date_series)
Output:
0 2021-01-15
1 1989-05-24
2 2020-01-18
3 2021-01-15
4 1999-04-04
Name: Date, dtype: object
Then use pyarrow for conversion:
df2_dat32 = pa.array(date_series)
print("datatype of values in the dataframe with dates:", type(date_series[0]))
print("value of dataframe after converting using pyarrow:", df2_dat32[0])
print("datatype after converting using pyarrow :", df2_dat32.type)
Output:
datatype of values in the dataframe with dates: <class 'datetime.date'>
value of dataframe after converting using pyarrow: 2021-01-15
datatype after converting using pyarrow : date32[day]
I have a date column in a dataset where the dates are like 'Apr-12','Jan-12' format. I would like to change the format to 04-2012,01-2012. I am looking for a function which can do this.
I think I know one guy with the same name. Jokes apart here is the solution to your problem.
We do have an inbuilt function named as strptime(), so it takes up the string and then convert into the format you want.
You need to import datetime first since it is the part of the datetime package of python. Don't no need to install anything, just import it.
Then this works like this: datetime.strptime(your_string, format_you_want)
# You can also do this, from datetime import * (this imports all the functions of datetime)
from datetime import datetime
str = 'Apr-12'
date_object = datetime.strptime(str, '%m-%Y')
print(date_object)
I hope this will work for you. Happy coding :)
You can do following:
import pandas as pd
df = pd.DataFrame({
'date': ['Apr-12', 'Jan-12', 'May-12', 'March-13', 'June-14']
})
pd.to_datetime(df['date'], format='%b-%y')
This will output:
0 2012-04-01
1 2012-01-01
2 2012-05-01
Name: date, dtype: datetime64[ns]
Which means you can update your date column right away:
df['date'] = pd.to_datetime(df['date'], format='%b-%y')
You can chain a couple of pandas methods together to get this the desired output:
df = pd.DataFrame({'date_fmt':['Apr-12','Jan-12']})
df
Input dataframe:
date_fmt
0 Apr-12
1 Jan-12
Use pd.to_datetime chained with .dt date accessor and strftime
pd.to_datetime(df['date_fmt'], format='%b-%y').dt.strftime('%m-%Y')
Output:
0 04-2012
1 01-2012
Name: date_fmt, dtype: object