Pandas date column: problem with date conversion - python

I have a column date in a Covid data set. The dates appear in this format 20211030 (year - month - day).
However, when converting that column, everything appears with 1970.
This is my code:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
The result is this:
0 MI PERU 1970-08-22 21:58:27
1 SAN JUAN DE LURIGANCHO 1970-08-22 19:27:09
2 YANAHUARA 1970-08-22 19:22:01
3 CUSCO 1970-08-22 22:08:41
4 PANGOA 1970-08-22 21:58:36
Thank you in advance for your help, big hug.
I get this error:
ValueError: The 'datetime64' dtype has no unit. Please pass in 'datetime64[ns]' instead.
my complete code
import pandas as pd
import numpy as np
import matplotlib.pyplot as
plt from datetime import datetime
dataset_covid = "datasetcovid.csv"
df = pd.read_csv(dataset_covid, sep=";", usecols=["DISTRITO", "FECHA_RESULTADO"])
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO'].astype('datetime64')
also try this other code
df['FECHA_RESULTADO'] = df['FECHA_RESULTADO''].astype(str).astype('datetime64')
ParserError: year 20210307 is out of range: 20210307.0

In your case, you don't need pd.to_datetime IF column contains strings:
df = pd.DataFrame({'FECHA': ['20211030']})
print(df)
# Output:
FECHA
0 20211030
Use astype:
df['FECHA'] = df['FECHA'].astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30
BUT if the dtype of your column FECHA is integer, you have to cast your column to string before:
df['FECHA'] = df['FECHA'].astype(str).astype('datetime64')
print(df)
# Output:
FECHA
0 2021-10-30

As noted in the comments, the result is caused by the parameters you are inputing in the to_datetime function. To fix this you should :
drop the unit parameter which is not related to your formating
add a format parameter which correspond to the date format you are using.
Hence, your code should go from:
df["FECHA"] = pd.to_datetime(df["FECHA"], unit='s')
To this:
df["FECHA"] = pd.to_datetime(df["FECHA"], format='%Y%m%d')
In order to find the proper formating you can lookup the values that correspond within this documentation. Docs related to the to_datetime function can be found here.
In our scenario the %Y corresponds to a year with century as a decimal number.
The %m to a padded month (with a starting zero). And the %d to the day in the month. This should match the 20211030 (year - month - day) given.

Related

Python - Remove lines prior to current month and year

I have a dataframe that contain arrival dates for vessels and I'd want to make python recognize the current year and month that we are at the moment and remove all entries that are prior to the current month and year.
I have a column with the date itself in the format '%d/%b/%Y' and columns for month and year separatly if needed.
For instance, if today is 01/01/2022. I'd like to remove everything that is from dec/2021 and prior.
Using pandas periods and boolean indexing:
# set up example
df = pd.DataFrame({'date': ['01/01/2022', '08/02/2022', '09/03/2022'], 'other_col': list('ABC')})
# find dates equal or greater to this month
keep = (pd.to_datetime(df['date'], dayfirst=False)
.dt.to_period('M')
.ge(pd.Timestamp('today').to_period('M'))
)
# filter
out = df[keep]
Output:
date other_col
1 08/02/2022 B
2 09/03/2022 C
from datetime import datetime
import pandas as pd
df = ...
# assuming your date column is named 'date'
t = datetime.utcnow()
df = df[pd.to_datetime(df.date) >= datetime(t.year, t.month, t.day)]
Let us consider this example dataframe:
import pandas as pd
import datetime
df = pd.DataFrame()
data = [['nao victoria', '21/Feb/2012'], ['argo', '6/Jun/2022'], ['kon tiki', '23/Aug/2022']]
df = pd.DataFrame(data, columns=['Vessel', 'Date'])
You can convert your dates to datetimes, by using pandas' to_datetime method; for instance, you may save the output into a new Series (column):
df['Datetime']=pd.to_datetime(df['Date'], format='%d/%b/%Y')
You end up with the following dataframe:
Vessel Date Datetime
0 nao victoria 21/Feb/2012 2012-02-21
1 argo 6/Jun/2022 2022-06-06
2 kon tiki 23/Aug/2022 2022-08-23
You can then reject rows containing datetime values that are smaller than today's date, defined using datetime's now method:
df = df[df.Datetime > datetime.datetime.now()]
This returns:
Vessel Date Datetime
2 kon tiki 23/Aug/2022 2022-08-23

Pandas dt accessor returns wrong day and month

My CSV data looks like this -
Date Time
1/12/2019 12:04AM
1/12/2019 12:09AM
1/12/2019 12:14AM
and so on
And I am trying to read this file using pandas in the following way -
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv',parse_dates=[['Date','Time']])
print(data['Date_Time'].dt.month)
When I try to access the year through the dt accessor the year prints out fine as 2019.
But when I try to print the day or the month it is completely incorrect. In the case of month it starts off as 1 and ends up as 12 when the right value should be 12 all the time.
With the day it starts off as 12 and ends up at 31 when it should start at 1 and end in 31. The file has total of 8867 entries. Where am I going wrong ?
The default format is MM/DD, while yours is DD/MM.
The simplest solution is to set the dayfirst parameter of read_csv:
dayfirst : DD/MM format dates, international and European format (default False)
data = pd.read_csv('D 2019.csv', parse_dates=[['Date', 'Time']], dayfirst=True)
# -------------
>>> data['Date_Time'].dt.month
# 0 12
# 1 12
# 2 12
# Name: Date_Time, dtype: int64
Try assigning format argument of pd.to_datetime
df = pd.read_csv('D 2019.csv')
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format='%d/%m/%Y %H:%M%p')
You need to check the data type of your dataframe and convert the column "Date" into datetime
df["Date"] = pd.to_datetime(df["Date"])
After you can access the day, month, or year using:
dt.day
dt.month
dt.year
Note: Make sure the format of the date (D/M/Y or M/D/Y)
Full Code
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv')
data["Date"] = pd.to_datetime(data["Date"])
print(data["Date"].dt.day)
print(data["Date"].dt.month)
print(data["Date"].dt.year)

Create date from one year with string and int error - PYTHON

I have the following problem. I want to create a date from another. To do this, I extract the year from the database date and then create the chosen date (day = 30 and month = 9) being the year extracted from the database.
The code is the following
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
But error message is this
"cannot convert the series to <class 'int'>"
I think dt mean datetime, so the line 'dt.datetime(y,m,d)' create datetime object type.
bbdd20Q3['mydate'] should get int?
If so, try to think of another way to store the date (8 numbers maybe).
hope I helped :)
I assume that you did import datetime as dt then by doing:
bbdd20Q3['year']=(pd.DatetimeIndex(bbdd20Q3['datedaymonthyear']).year)
y=(bbdd20Q3['year'])
m=int(9)
d=int(30)
bbdd20Q3['mydate']=dt.datetime(y,m,d)
You are delivering series as first argument to datetime.datetime, when it excepts int or something which can be converted to int. You should create one datetime.datetime for each element of series not single datetime.datetime, consider following example
import datetime
import pandas as pd
df = pd.DataFrame({"year":[2001,2002,2003]})
df["day"] = df["year"].apply(lambda x:datetime.datetime(x,9,30))
print(df)
Output:
year day
0 2001 2001-09-30
1 2002 2002-09-30
2 2003 2003-09-30
Here's a sample code with the required logic -
import pandas as pd
df = pd.DataFrame.from_dict({'date': ['2019-12-14', '2020-12-15']})
print(df.dtypes)
# convert the date in string format to datetime object,
# if the date column(Series) is already a datetime object then this is not required
df['date'] = pd.to_datetime(df['date'])
print(f'after conversion \n {df.dtypes}')
# logic to create a new data column
df['new_date'] = pd.to_datetime({'year':df['date'].dt.year,'month':9,'day':30})
#eollon I see that you are also new to Stack Overflow. It would be better if you can add a simple sample code, which others can tryout independently
(keeping the comment here since I don't have permission to comment :) )

Python datetime delta format

I am attempting to find records in my dataframe that are 30 days old or older. I pretty much have everything working but I need to correct the format of the Age column. Most everything in the program is stuff I found on stack overflow, but I can't figure out how to change the format of the delta that is returned.
import pandas as pd
import datetime as dt
file_name = '/Aging_SRs.xls'
sheet = 'All'
df = pd.read_excel(io=file_name, sheet_name=sheet)
df.rename(columns={'SR Create Date': 'Create_Date', 'SR Number': 'SR'}, inplace=True)
tday = dt.date.today()
tdelta = dt.timedelta(days=30)
aged = tday - tdelta
df = df.loc[df.Create_Date <= aged, :]
# Sets the SR as the index.
df = df.set_index('SR', drop = True)
# Created the Age column.
df.insert(2, 'Age', 0)
# Calculates the days between the Create Date and Today.
df['Age'] = df['Create_Date'].subtract(tday)
The calculation in the last line above gives me the result, but it looks like -197 days +09:39:12 and I need it to just be a positive number 197. I have also tried to search using the python, pandas, and datetime keywords.
df.rename(columns={'Create_Date': 'SR Create Date'}, inplace=True)
writer = pd.ExcelWriter('output_test.xlsx')
df.to_excel(writer)
writer.save()
I can't see your example data, but IIUC and you're just trying to get the absolute value of the number of days of a timedelta, this should work:
df['Age'] = abs(df['Create_Date'].subtract(tday)).dt.days)
Explanation:
Given a dataframe with a timedelta column:
>>> df
delta
0 26523 days 01:57:59
1 -1601 days +01:57:59
You can extract just the number of days as an int using dt.days:
>>> df['delta']dt.days
0 26523
1 -1601
Name: delta, dtype: int64
Then, all you need to do is wrap that in a call to abs to get the absolute value of that int:
>>> abs(df.delta.dt.days)
0 26523
1 1601
Name: delta, dtype: int64
here is what i worked out for basically the same issue.
# create timestamp for today, normalize to 00:00:00
today = pd.to_datetime('today', ).normalize()
# match timezone with datetimes in df so subtraction works
today = today.tz_localize(df['posted'].dt.tz)
# create 'age' column for days old
df['age'] = (today - df['posted']).dt.days
pretty much the same as the answer above, but without the call to abs().

ValueError when converting String to datetime

I have a dataframe as follows, and I am trying to reduce the dataframe to only contain rows for which the Date is greater than a variable curve_enddate. The df['Date'] is in datetime and hence I'm trying to convert curve_enddate[i][0] which gives a string of the form 2015-06-24 to datetime but am getting the error ValueError: time data '2015-06-24' does not match format '%Y-%b-%d'.
Date Maturity Yield_pct Currency
0 2015-06-24 0.25 na CAD
1 2015-06-25 0.25 0.0948511020 CAD
The line where I get the Error:
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%b-%d')]
Thank You
You are using wrong date format, %b is for the named months (abbreviations like Jan or Feb , etc), use %m for the numbered months.
Code -
df = df[df['Date'] > time.strptime(curve_enddate[i][0], '%Y-%m-%d')]
You cannot compare a time.struct_time tuple which is what time.strptime returns to a Timestamp so you also need to change that as well as using '%Y-%m-%d' using m which is the month as a decimal number. You can use pd.to_datetime to create the object to compare:
df = df[df['Date'] > pd.to_datetime(curve_enddate[i][0], '%Y-%m-%d')]

Categories