I want to generate time/date format strings from the input data I got.
Is there an easy way to do this?
My input data looks like this:
'01.12.2016 23:30:59,123'
So my code should generate the following format string:
'%d.%m.%Y %H:%M:%S,%f'
Background:
I used pandas.to_datetime() to generate datetime object for further processing. This works great but this function gets slow (uses dateutil.parser.parse here) with a lot of data (>~50k). At the moment I'm providing the format string above hardcoded within my code to speed up to_datetime() which also works great. Now I wanted to generate the format string within code to be more flexible regaring the input data.
edit (because the first two answers do not fit to my question):
I want to generate the format string not the datetime string.
edit2:
New approch to formulate the question: I'm reading in a file with a lot of data. Every line of data has got a timestamp with the following format: '01.12.2016 23:30:59,123'. I want to convert these timestamps into datetime objects. For this I'm using pandas.to_datetime() at the moment. This function works perfectly but it get slow since I got some files with over 50k datasets. To speed this process up I'm passing a format string within the function pandas.to_datetime(format='%d.%m.%Y %H:%M:%S,%f'). This speeds up the process but it is less flexible. Therefore I want to evaluate the format string only for the first dataset and use it for the rest of the 50k or more datasets.
How is this possible?
you can try to use infer_datetime_format parameter, but be aware - pd.to_datetime() will use dayfirst=False per default
Demo:
In [422]: s
Out[422]:
0 01.12.2016 23:30:59,123
1 23.12.2016 03:30:59,123
2 31.12.2016 13:30:59,123
dtype: object
In [423]: pd.to_datetime(s, infer_datetime_format=True)
Out[423]:
0 2016-01-12 23:30:59.123
1 2016-12-23 03:30:59.123
2 2016-12-31 13:30:59.123
dtype: datetime64[ns]
In [424]: pd.to_datetime(s, infer_datetime_format=True, dayfirst=True)
Out[424]:
0 2016-12-01 23:30:59.123
1 2016-12-23 03:30:59.123
2 2016-12-31 13:30:59.123
dtype: datetime64[ns]
use "datatime" to return the data and time. I this this will help you.
import datetime
print datetime.datetime.now().strftime('%d.%m.%Y %H:%M:%S,%f')
You can use datetime.strptime() inside datetime package which would return a datetime.datetime object.
In your case you should do something like:
datetime.strptime('01.12.2016 23:30:59,123', '%d.%m.%Y %H:%M:%S,%f').
After you have the datetime.datetime object, you can use datetime.strftime() function to get the datetime in the desired string format.
You should probably have a look here: https://github.com/humangeo/DateSense/
From its documentation:
>>> import DateSense
>>> print DateSense.detect_format( ["15 Dec 2014", "9 Jan 2015"] )
%d %b %Y
Related
I am trying to convert some data from a .txt file to a dataframe to use it for some analysis
the form of the data in the .txt is a follows
DATE_TIME VELOC MEASURE
[m/s] [l/h]
A 09.01.2023 12:45:20 ??? ???
A 09.01.2023 12:46:20 0,048 52,67
A 09.01.2023 12:47:20 0,049 53,77
A 09.01.2023 12:48:20 0,050 54,86
I load the data to a dataframe no problem i covnert the str values of the measurement to float etc everything is good as shows in the
image
the problem I get is when trying to convert the column of the date time that is string to datetime pandas format using this line of code:
volume_flow['DATE_TIME'] = pd.to_datetime(volume_flow['DATE_TIME'], format = '%d.%m.%Y %H:%M:S')
and i get the following error
ValueError: time data '09.01.2023 12:46:20' does not match format '%d.%m.%Y %H:%M:S' (match)
but i don't see how the format is off
I am really lost as to why this is caused as i used the same code with different formats of datetime before with no problem
further more i tried using format = '%dd.%mm.%yyyy %H:%M:S' as well with the same results and when i let the pandas.to_datetime convert it automatically it confuses the day and the month of the data. the data is between 09.01-12.01 so you can't really tell if one is the month or day just by the values.
I think you should go from this
(..., format='%d.%m.%Y %H:%M:S')
to this
(..., format='%d.%m.%Y %H:%M:%S')
You forgot the percentage character!
check the documentations for correct time format. You will note that the directive %S represents the seconds.
Second as a decimal number [00,61].
The default format of csv is dd/mm/yyyy. When I convert it to datetime by df['Date']=pd.to_datetime(df['Date']), it change the format to mm//dd/yyyy.
Then, I used df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%d/%m/%Y')
to convert to dd/mm/yyyy, But, they are in the string (object) format. However, I need to change them to datetime format. When I use again this (df['Date']=pd.to_datetime(df['Date'])), it gets back to the previous format. Need your help
You can use the parse_dates and dayfirst arguments of pd.read_csv, see: the docs for read_csv()
df = pd.read_csv('myfile.csv', parse_dates=['Date'], dayfirst=True)
This will read the Date column as datetime values, correctly taking the first part of the date input as the day. Note that in general you will want your dates to be stored as datetime objects.
Then, if you need to output the dates as a string you can call dt.strftime():
df['Date'].dt.strftime('%d/%m/%Y')
When I use again this: df['Date'] = pd.to_datetime(df['Date']), it gets back to the previous format.
No, you cannot simultaneously have the string format of your choice and keep your series of type datetime. As remarked here:
datetime series are stored internally as integers. Any
human-readable date representation is just that, a representation,
not the underlying integer. To access your custom formatting, you can
use methods available in Pandas. You can even store such a text
representation in a pd.Series variable:
formatted_dates = df['datetime'].dt.strftime('%m/%d/%Y')
The dtype of formatted_dates will be object, which indicates
that the elements of your series point to arbitrary Python times. In
this case, those arbitrary types happen to be all strings.
Lastly, I strongly recommend you do not convert a datetime series
to strings until the very last step in your workflow. This is because
as soon as you do so, you will no longer be able to use efficient,
vectorised operations on such a series.
This solution will work for all cases where a column has mixed date formats. Add more conditions to the function if needed. Pandas to_datetime() function was not working for me, but this seems to work well.
import date
def format(val):
a = pd.to_datetime(val, errors='coerce', cache=False).strftime('%m/%d/%Y')
try:
date_time_obj = datetime.datetime.strptime(a, '%d/%m/%Y')
except:
date_time_obj = datetime.datetime.strptime(a, '%m/%d/%Y')
return date_time_obj.date()
Saving the changes to the same column.
df['Date'] = df['Date'].apply(lambda x: format(x))
Saving as CSV.
df.to_csv(f'{file_name}.csv', index=False, date_format='%s')
2018-02-06T14:45:03.0040554Z
Unable to convert this string to datetime.
datetime.strptime('2018-02-06T14:45:03.0040554Z', '%Y-%m-%dT%H:%M:%fZ')
I am trying this and it doesn't work.
datetime.strptime('2018-02-06T14:45:03Z', '%Y-%m-%dT%H:%M:%fZ')
This work for above time format.
But I'm getting this time string from third party api so can't change its format.
If you can use the dateutil module you can do this.
from dateutil import parser
print parser.parse('2018-02-06T14:45:03.0040554Z')
Output:
2018-02-06 14:45:03.004055+00:00
Using Datetime. Looks like you missed the seconds param and the microseconds need to be 6 you have 7.
import datetime
print datetime.datetime.strptime('2018-02-06T14:45:03.004055Z', '%Y-%m-%dT%H:%M:%S.%fZ')
print datetime.datetime.strptime('2018-02-06T14:45:03Z', '%Y-%m-%dT%H:%M:%SZ')
Output:
2018-02-06 14:45:03.004055
2018-02-06 14:45:03
One way is to use pandas:
import pandas as pd
pd.to_datetime('2018-02-06T14:45:03.0040554Z').to_pydatetime()
'2018-02-06 14:45:03.004055'
Your input format is super-strange, as it has 7 digits in second fraction
Take a look at old question: How do I convert a date string containing 7 digits in milliseconds into a date in Python
You can either split/rejoin string and use '%Y-%m-%dT%H:%M:%S.%fZ' or use dateutil.parser.parse
I have been stumped for the past few hours trying to solve the following.
In a large data set I have from an automated system, there is a DATE_TIME value, which for rows at midnight has values that dont have a the full hour like: 12-MAY-2017 0:16:20
When I try convert this to a date (so that its usable for conversions) as follows:
df['DATE_TIME'].astype('datetime64[ns]')
I get the following error:
Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
I tried writing some REGEX to pull out each piece but couldnt get anything working given the hour could be either 1 or two characters respectively. It also doesn't seem like an ideal solution to write regex for each peice.
Any ideas on this?
Try to use pandas.to_datetime() method:
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'], errors='coerce')
Parameter errors='coerce' will take care of those strings that can't be converted to datatime dtype
I think you need pandas.to_datetime only:
df = pd.DataFrame({'DATE_TIME':['12-MAY-2017 0:16:20','12-MAY-2017 0:16:20']})
print (df)
DATE_TIME
0 12-MAY-2017 0:16:20
1 12-MAY-2017 0:16:20
df['DATE_TIME'] = pd.to_datetime(df['DATE_TIME'])
print (df)
DATE_TIME
0 2017-05-12 00:16:20
1 2017-05-12 00:16:20
Convert in numpy by astype seems problematic, because need strings in ISO 8601 date or datetime format:
df['DATE_TIME'].astype('datetime64[ns]')
ValueError: Error parsing datetime string "12-MAY-2017 0:16:20" at position 3
EDIT:
If datetimes are broken (some strings or ints) then use MaxU answer.
I am trying to do a simple test on pandas capabilities to handle dates & format.
For that i have created a dataframe with values like below. :
df = pd.DataFrame({'date1' : ['10-11-11','12-11-12','10-10-10','12-11-11',
'12-12-12','11-12-11','11-11-11']})
Here I am assuming that the values are dates. And I am converting it into proper format using pandas' to_datetime function.
df['format_date1'] = pd.to_datetime(df['date1'])
print(df)
Out[3]:
date1 format_date1
0 10-11-11 2011-10-11
1 12-11-12 2012-12-11
2 10-10-10 2010-10-10
3 12-11-11 2011-12-11
4 12-12-12 2012-12-12
5 11-12-11 2011-11-12
6 11-11-11 2011-11-11
Here, Pandas is reading the date of the dataframe as "MM/DD/YY" and converting it in native format (i.e. YYYY/MM/DD). I want to check if Pandas can take my input indicating that the date format is actually "YY/MM/DD" and then let it convert into its native format. This will change the value of row no.: 5. To do this, I have run following code. But it is giving me an error.
df3['format_date2'] = pd.to_datetime(df3['date1'], format='%Y/%m/%d')
ValueError: time data '10-10-10' does not match format '%Y/%m/%d' (match)
I have seen the sort of solution here. But I was hoping to get a little easy and crisp answer.
%Y in the format specifier takes the 4-digit year (i.e. 2016). %y takes the 2-digit year (i.e. 16, meaning 2016). Change the %Y to %y and it should work.
Also the dashes in your format specifier are not present. You need to change your format to %y-%m-%d