How to properly use a time series using pandas (python) - python

I am attempting to create a time series index using pandas. Currently this is the code I am running:
date_string = df3["Date"]
date_times = pd.to_datetime(date_string, yearfirst=True, format='%Y%m%d%H%M')
df3_i = df3.set_index(date_times)
Yet I am getting constant errors. Can anyone explain?
Error:
ValueError: time data '2017-03-08 13:53' does not match format '%Y%m%d%H:%M' (match)

That's because the format is '%Y-%m-%d %H:%M'
There are special character combinations that are meant to represent the numeric components of the date and time. A great reference can be found here
You have a time string of '2017-03-08 13:53' as evidenced by you error message. From the link you'll find that:
4 digit year is '%Y'
2 digit month is '%m'
2 digit day is '%d'
2 digit hour is '%H'
2 digit minute is '%M'
So you still need to represent the other string bits like the dashes, space, and the colon
Thus '%Y-%m-%d %H:%M'
Use this instead
date_string = df3["Date"]
date_times = pd.to_datetime(date_string, yearfirst=True, format='%Y-%m-%d %H:%M')
df3_i = df3.set_index(date_times)
If that doesn't work, then you have inconsistent date formats and my first course of action would be to yell at whoever created the thing I'm trying to parse.
If that happens to be your scenario, ask another question... Or I might.

Related

Pandas.to_datetime doesn't recognize the format of the string to be converted to datetime

I am trying to convert some data from a .txt file to a dataframe to use it for some analysis
the form of the data in the .txt is a follows
DATE_TIME VELOC MEASURE
[m/s] [l/h]
A 09.01.2023 12:45:20 ??? ???
A 09.01.2023 12:46:20 0,048 52,67
A 09.01.2023 12:47:20 0,049 53,77
A 09.01.2023 12:48:20 0,050 54,86
I load the data to a dataframe no problem i covnert the str values of the measurement to float etc everything is good as shows in the
image
the problem I get is when trying to convert the column of the date time that is string to datetime pandas format using this line of code:
volume_flow['DATE_TIME'] = pd.to_datetime(volume_flow['DATE_TIME'], format = '%d.%m.%Y %H:%M:S')
and i get the following error
ValueError: time data '09.01.2023 12:46:20' does not match format '%d.%m.%Y %H:%M:S' (match)
but i don't see how the format is off
I am really lost as to why this is caused as i used the same code with different formats of datetime before with no problem
further more i tried using format = '%dd.%mm.%yyyy %H:%M:S' as well with the same results and when i let the pandas.to_datetime convert it automatically it confuses the day and the month of the data. the data is between 09.01-12.01 so you can't really tell if one is the month or day just by the values.
I think you should go from this
(..., format='%d.%m.%Y %H:%M:S')
to this
(..., format='%d.%m.%Y %H:%M:%S')
You forgot the percentage character!
check the documentations for correct time format. You will note that the directive %S represents the seconds.
Second as a decimal number [00,61].

Change date format of these string using Python

I have a string from a pdf that I want to transform it to the date format that I want to work with later,
the string is
05Dec22
how can I change it to 12/05/2022?
import datetime
date1 = '05Dec22'
date1 = datetime.datetime.strptime(date1, '%d%m%Y').strftime('%m/%d/%y')
date1 = str(date1)
This is what i tried so far
If you execute the code you'll get the following error,
ValueError: time data '05Dec22' does not match format '%d%m%Y'
this is because your time string is not in the specified format given ('%d%m%Y'). You can search for tables on the internet which show the placeholders that represent a certain formatting, if you look at the one provided here, you'll see that the formatting your string has is '%d%b%y', in this case, the %b placeholder represents the abbreviated month name and the %y placeholder is the year without century, just as your example string. Now, if you fix that in your code,
import datetime
date1 = '05Dec22'
date1 = datetime.datetime.strptime(date1, '%d%b%y').strftime('%m/%d/%Y')
date1 = str(date1)
you'll get the desired result.
Note that you also have to change the output format in strftime. As I said before, the %y placeholder is the year without century. For you to get the year including the century, you have to use %Y.

I am having trouble formatting a date using datetime

I am getting a date from an API that I am trying to pass to my template after formatting using datetime but I keep getting this error:
time data 2021-03-09T05:00:00.000Z does not match format %Y-%m-%d, %I:%M:%S;%f
I know I have to strftime and then strptime but I cant get past that error.
I would like to split it into two variables one for date and one for the time that will show in the users timezone.
date = game['schedule']['date']
game_date = datetime.strptime(date, '%Y-%m-%d, %I:%M:%S;%f')
You have a slightly wrong time format:
game_date = datetime.strptime(date, '%Y-%m-%dT%H:%M:%S.%fZ')
Or (if you remove the last Z character from the date string) you can also use datetime.fromisoformat:
game_date = datetime.fromisoformat(date[:-1])
And then you can extract date and time this way:
date = game_date.date()
time = game_date.time()
time_with_timezone = game_date.timetz()

Time data does not match format - ValueError

I am trying to change string to datetime like below:
max_datetime = datetime.strptime(max_date,'%y-%m-%d %H:%M:%S')
However, I am getting the below-mentioned error:
ValueError: time data '2008-05-15 11:26:40' does not match format '%y-%m-%d %H:%M:%S'
Any help will be appreciated!
The documentation of datetime tells that %y (with a lowercase y) represents a two-digit year, while from the error-message we can see that your input, max_date has a four-digit year. A four-digit year is represented by %Y (with an uppercase Y). So this is the source of your error. Since the rest looks fine,
max_datetime = datetime.strptime(max_date, "%Y-%m-%d %H:%M:%S")
should do the job.

Passing chopped down datetimes

I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.

Categories