Wrong string format when converting my date column - python

I am trying to convert my date column called df['CO date'] which shows in this format 3/02/21 meaning date/month/year, the problem arises when I parse it and then pass it to string, like this.
df['CO date'] = pd.to_datetime(df['CO date']).dt.strftime("%d/%m/%y")
for some reason after I converted from datetime to string with the shown format it returns my date in an american format like 02/03/21 , I don't understand why this happens, the only thing I can think of is that Python only has the string format %d which shows the days as 01,02,03,04,etc where as the date on my df originally is day "3" (non-padding zero).
Does anybody know how can I solve this problem?.
Many thanks in advance

Your formatting looks right. The only way you get that result, is your data frame contains wrong or corrupted data. You can make a sanity check by:
pd.to_datetime("2021-03-02").strftime("%d/%m/%y")
>>>
'02/03/21'
I think you are converting with wrong format in the beginning at pd.to_datetime(df['CO date']) part. If you know exact format you should use format in pd.to_datetime like:
pd.to_datetime("2021-02-03", format="%Y-%d-%m").strftime("%d/%m/%y")
>>>
'02/03/21'

output date in a try and catch block and see if you can get the dataframe column with the invalid date to try an error. Check for ranges for day and month and year and custom throw and error if exceeded.
print(date.day)
print(date.month)
print(date.year)
def date_check(date):
try:
datetime.strptime(date, '%d/%m/%Y')
return True
except ValueError:
return False
or
if pd.to_datetime(df['date'], format='%d-%b-%Y', errors='coerce').notnull().all():

Related

Pandas.to_datetime doesn't recognize the format of the string to be converted to datetime

I am trying to convert some data from a .txt file to a dataframe to use it for some analysis
the form of the data in the .txt is a follows
DATE_TIME VELOC MEASURE
[m/s] [l/h]
A 09.01.2023 12:45:20 ??? ???
A 09.01.2023 12:46:20 0,048 52,67
A 09.01.2023 12:47:20 0,049 53,77
A 09.01.2023 12:48:20 0,050 54,86
I load the data to a dataframe no problem i covnert the str values of the measurement to float etc everything is good as shows in the
image
the problem I get is when trying to convert the column of the date time that is string to datetime pandas format using this line of code:
volume_flow['DATE_TIME'] = pd.to_datetime(volume_flow['DATE_TIME'], format = '%d.%m.%Y %H:%M:S')
and i get the following error
ValueError: time data '09.01.2023 12:46:20' does not match format '%d.%m.%Y %H:%M:S' (match)
but i don't see how the format is off
I am really lost as to why this is caused as i used the same code with different formats of datetime before with no problem
further more i tried using format = '%dd.%mm.%yyyy %H:%M:S' as well with the same results and when i let the pandas.to_datetime convert it automatically it confuses the day and the month of the data. the data is between 09.01-12.01 so you can't really tell if one is the month or day just by the values.
I think you should go from this
(..., format='%d.%m.%Y %H:%M:S')
to this
(..., format='%d.%m.%Y %H:%M:%S')
You forgot the percentage character!
check the documentations for correct time format. You will note that the directive %S represents the seconds.
Second as a decimal number [00,61].

reformatting the timestamp in my dataset to have it as datetime

I want to reformat the timestamp in my dataset to have it as a date + time.
here is my dataset
and I tried this
data1 = pd.read_excel(r"C:\Users\user\Desktop\Consumption.xlsx")
data1['Timestamp']= pd.to_datetime(['Timestamp'], unit='s')
and I got this error
ValueError: non convertible value Timestamp with the unit 's'
I also tried not to pass the "unit" in the pd.to_datetime function and it gave an error
The type of time stamp is Object. Please any help.
Format of datetimes is not unix time, so raised error. You can split values by ; and select second lists by str[1] and then convert to datetimes:
data1['Timestamp']= pd.to_datetime(data1['Timestamp'].str.split(';').str[1])
I would suggest you check the documentation of the function here
If you want to add date-time, you can format like this:
format='%d/%m/%Y %H:%M:%S'
Try this:
data1['Date'] = pd.DataFrame(data1['Timestamp'], format ='%d/%m/%Y')

Python shows me an Error (Out of Bounds) when running the DateTime

I am trying to change the columns 'check in' and 'check out' to Datetime.
When I run the code below I get the error message 'Out of bounds nanosecond' (picture attached as well). Can anybody help me, how I can get rid of this problem please?
Thank you in advance!
expedia2013_2014["srch_co"] = pd.to_datetime(expedia2013_2014["srch_co"])
expedia2013_2014["srch_ci"] = pd.to_datetime(expedia2013_2014["srch_ci"])
Variables of Timestamp type can hold dates roughly from years in range (1677, 2262).
You attempt to convert date from year 2557, so it is out of range.
You have to use other date format.
One of possible options: If you have the source date as a string,
e.g. src='2557-08-17', you can run:
result = datetime.datetime.strptime(src, '%Y-%m-%d')
If you have a DataFrame with source column named Dat (as string), you can
convert it running:
df.Dat = df.Dat.apply(lambda x: datetime.datetime.strptime(x, '%Y-%m-%d').date())
When you print df, it is printed just the same way, but when you read it
calling df.loc[some_index,'Dat'], the result will be
datetime.date(2557, 8, 17).

Passing chopped down datetimes

I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.

Getting columns with datetime format such as (2017-02-12 10:23:55 AM)[YYYY-MM-dd hh:mm:ss AM/PM] using pandas

I recently asked a question about identifing all the columns which are datetime. Here it is: Get all columns with datetime type using pandas?
The answer was correct for a proper date time format, however, I now realize my data isn't proper date time, it is a string formatted like "2017-02-12 10:23:55 AM" and I was advised to create a new question.
I have a huge dataframe with an unknown number of date time columns, where I do not know their names nor their position. How do I identify the column names of the date time columns which have the date of format such as YYYY-MM-dd hh:mm:ss AM/PM?
One way to do this would be to test for successful conversion:
def is_datetime(datetime_string):
try:
pd.to_datetime(datetime_string)
return True
except ValueError:
return False
With this:
dt_columns = [c for c in df.columns if is_datetime(df[c][0])]
Note: This tests for any string that can be converted to a datetime.

Categories