datetime strptime converts 10 to 01? - python

I am working with a date column in this form:
Date
1871.01
1871.02
...
1871.10
1871.11
So to convert the column to a datetimeindex, I use:
df["Date"].apply(lambda x: datetime.strptime(str(x), "%Y.%m"))
however the column is converted to:
Date
1871-01-01
1871-02-01
...
1871-01-01
1871-11-01
Does anyone have any idea of what causes this, where all "10"s convert to "01"s? Is there a better way to do this given my inputs are floats?

If the first format is a float, the 1871.10 and 1871.1 are exactly the same numbers. So the string of it will have the second value (the shortest one). But then it would seems it is January (month 1).
So you should stringify forcing two digits:
df["Date"].apply(lambda x: datetime.strptime("{:.2f}" % x, "%Y.%m"))
Note: the first format is very bad. The true solution is to correct it from beginning (e.g. when you read the input file you must tell the read function that the column is a date, not a float.

Related

Split date column into YYYY.MM.DD

I have a dataframe column in the format of 20180531.
I need to split this properly i.e. I can get 2018/05/31.
This is a dataframe column that I have and I need to deal with it in a datetime format.
Currently this column is identified as int64 type
I'm not sure how efficient it'll be but if you convert it to a string, and the use pd.to_datetime with a .format=..., eg:
df['actual_datetime'] = pd.to_datetime(df['your_column'].astype(str), format='%Y%m%d')
As Emma points out - the astype(str) is redundant here and just:
df['actual_datetime'] = pd.to_datetime(df['your_column'], format='%Y%m%d')
will work fine.
Assuming the integer dates would always be fixed width at 8 digits, you may try:
df['dt'] = df['dt_int'].astype(str).str.replace(r'(\d{4})(\d{2})(\d{2})', r'\1-\2-\3')

Wrong string format when converting my date column

I am trying to convert my date column called df['CO date'] which shows in this format 3/02/21 meaning date/month/year, the problem arises when I parse it and then pass it to string, like this.
df['CO date'] = pd.to_datetime(df['CO date']).dt.strftime("%d/%m/%y")
for some reason after I converted from datetime to string with the shown format it returns my date in an american format like 02/03/21 , I don't understand why this happens, the only thing I can think of is that Python only has the string format %d which shows the days as 01,02,03,04,etc where as the date on my df originally is day "3" (non-padding zero).
Does anybody know how can I solve this problem?.
Many thanks in advance
Your formatting looks right. The only way you get that result, is your data frame contains wrong or corrupted data. You can make a sanity check by:
pd.to_datetime("2021-03-02").strftime("%d/%m/%y")
>>>
'02/03/21'
I think you are converting with wrong format in the beginning at pd.to_datetime(df['CO date']) part. If you know exact format you should use format in pd.to_datetime like:
pd.to_datetime("2021-02-03", format="%Y-%d-%m").strftime("%d/%m/%y")
>>>
'02/03/21'
output date in a try and catch block and see if you can get the dataframe column with the invalid date to try an error. Check for ranges for day and month and year and custom throw and error if exceeded.
print(date.day)
print(date.month)
print(date.year)
def date_check(date):
try:
datetime.strptime(date, '%d/%m/%Y')
return True
except ValueError:
return False
or
if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():

converting integer to datetime values pandas

I have a column with datetime and integer values..However i want to convert even the integer values to datetikme as well.
StartTime EndTime Source
2170-01-01 00:00:00 2170-01-01 00:00:00 NA
1.60405e+18 1.60405e+18 Arm_dearm
I tried using
pd.to_datetime(site_list['StartTime'])
but it yields the same result.
And Ultimately i am not able to run the following line.
np.where(site_list["StartTime"].dt.date!=site_list["EndTime"].dt.date,"armed next day",site_list["Source"])
which throws the following error:
mixed datetimes and integers in passed array
The issue as the error states is that there are mixed types in the columns so it will not convert it.
I'm going to assume it is a UNIX timestamp.
Check this answer and see if maybe this is the time format you are dealing with.
We can get around this by doing something like this:
import datetime
def convert(time):
return datetime.datetime.fromtimestamp(time)
site_list["StartTime"] = site_list["StartTime"].apply(lambda x: convert(x) if isinstance(x, int) else x)
This will check each value in the StartTime column, then check if it is a integer or not. If it is then it will convert it to the datetime type. If is not a integer it will leave it alone.

convert a generic date into datetime in python (pandas)

Good Afternoon,
I have a huge dataset where time informations are stored as a float64 (or integer) in one column of the dataframe in format 'ddmmyyyy' (ex. 20 January 2020 would be the float 20012020.0). I need to convert it into a datetime like 'dd-mm-yyyy'. I saw the function to_datetime, but i can't really manage to obtain what i want. Does someone know how to do it?
Massimo
You could try converting to string and after that, to date format, you want like this:
# The first step is to change the type of the column,
# in order to get rid of the .0 we will change first to int and then to
# string
df["date"] = df["date"].astype(int)
df["date"] = df["date"].astype(str)
for i, row in df.iterrows():
# In case that the date format is similar to 20012020
x = str(df["date"].iloc[i])
if len(x) == 8:
df.at[i,'date'] = "{}-{}-{}".format(x[:2], x[2:4], x[4:])
# In case that the format is similar to 1012020
else:
df.at[i,'date'] = "0{}-{}-{}".format(x[0], x[1:3], x[3:])
Edit:
As you said this solution only works if the month always comes in 2
digits.
Added missing variable in the loop
Added change column types before entering the loop.
Let me know if this helps!

Python - convert date string from YYYY-MM-DD to DD-MMM-YYYY using datetime?

So I have read a number of threads on this, and am still stumped. Any help would be sincerely appreciated.
I have a column in a dataframe that contains strings of dates, or nothing. Strings are in this format: 2017-10-17, i.e. YYYY-MM-DD.
I want to convert these to DD-MMM-YYYY, so the above would read 17-Oct-2017.
Here is the code that I have, which seems to do nothing. It doesn't error out, but it doesn't actually modify the dates; they are the same as before. I'm calling this in the beginning: import datetime
df_combined['VisitDate'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d').strftime('%d-%b-%Y') if x != "" else "")
I expected this to return a string in a different format than the format the original string was in when it's read from the column.
You probably just need to assign the result back to the column itself:
df_combined['VisitDate'] = df_combined['VisitDate'].apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d').strftime('%d-%b-%Y') if x != "" else "")
No need apply by using pd.to_datetime
pd.to_datetime(df_combined['VisitDate'],errors='coerce',format='%d-%b-%Y').fillna('')

Categories