6 1 I am new to Pandas and Python. I want to do some date time operations in my script. I am getting date time information from a csv file in following format: PT12M20S
How to convert it into pandas datetime format? Something like: 12:20
During Convertion Error is : Unknown string format: PT13M20S
Try this:
pd.to_datetime("PT12M20S", format='PT%HM%MS')
Replace the single string "PT12M20S" with your dataframe column.
EDIT: to reflect additional information given in the comments:
You can use errors="coerce" to ignore all cases where the first format cant be used. And then fill all the missing values using their respective format.
import pandas as pd
df = pd.DataFrame(data=["PT20M23S", "PT45M", "PT25S"], columns=["date"])
df["date"] = pd.to_datetime(df["date"], format="PT%HM%MS", errors="coerce"). \
fillna(pd.to_datetime(df["date"], format="PT%MM", errors="coerce")). \
fillna(pd.to_datetime(df["date"], format="PT%SS", errors="coerce"))
output:
date
0 1900-01-01 20:23:00
1 1900-01-01 00:45:00
2 1900-01-01 00:00:25
Alternatively you can design your own function and apply it.
from datetime import datetime
def format_date(val):
for f in {"PT%HM%MS", "PT%MM", "PT%SS"}:
try:
return datetime.strptime(val, f)
except ValueError:
continue
return None
df["date"].apply(lambda x: format_date(x))
Related
I have a date in format of YYYY-MM-DD (2022-11-01). I want to convert it to 'YYYYMMDD' format (without hyphen). Pls support.
I tried this...
df['ConvertedDate']= df['DateOfBirth'].dt.strftime('%m/%d/%Y')... but no luck
If I understand correctly, the format mask you should be using with strftime is %Y%m%d:
df["ConvertedDate"] = df["DateOfBirth"].dt.strftime('%Y%m%d')
Pandas itself providing the ability to convert strings to datetime in Pandas dataFrame with desire format.
df['ConvertedDate'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d').dt.strftime('%Y%m%d')
Referenced Example:
import pandas as pd
values = {'DateOfBirth': ['2021-01-14', '2022-11-01', '2022-11-01']}
df = pd.DataFrame(values)
df['ConvertedDate'] = pd.to_datetime(df['DateOfBirth'], format='%Y-%m-%d').dt.strftime('%Y%m%d')
print (df)
Output:
DateOfBirth ConvertedDate
0 2021-01-14 20210114
1 2022-11-01 20221101
2 2022-11-01 20221101
This works
from datetime import datetime
initial = "2022-11-01"
time = datetime.strptime(initial, "%Y-%m-%d")
print(time.strftime("%Y%m%d"))
I am trying to convert a column in string format to DateTime format, However, I am getting the following error, could somebody please help?
The error:time data '42:53.700' does not match format '%H:%M:%S.%f' (match)
Code:
Merge_df['Time'] = pd.to_datetime(Merge_df['Time'], format='%H:%M:%S.%f')
You'll need to clean the data to get a common format before you can parse to data type 'datetime'. For example you can remove the colons and fill with zeros, then parse with the appropriate directive:
import pandas as pd
df = pd.DataFrame({'time': ["1:45.333", "45:22.394", "4:55:23.444", "23:44:01.004"]})
df['time'] = pd.to_datetime(df['time'].str.replace(':', '').str.zfill(10), format="%H%M%S.%f")
df['time']
0 1900-01-01 00:01:45.333
1 1900-01-01 00:45:22.394
2 1900-01-01 04:55:23.444
3 1900-01-01 23:44:01.004
Name: time, dtype: datetime64[ns]
Since the data actually looks more like a duration to me, here's a way how to convert to data type 'timedelta'. You'll need to ensure HH:MM:SS.fff format which is a bit more work:
# ensure common string length
df['time'] = df['time'].str.zfill(12)
# ensure HH:MM:SS.fff format
df['time'] = df['time'].str[:2] + ":" + df['time'].str[3:5] + ":" + df['time'].str[6:]
df['timedelta'] = pd.to_timedelta(df['time'])
df['timedelta']
0 0 days 00:01:45.333000
1 0 days 00:45:22.394000
2 0 days 04:55:23.444000
3 0 days 23:44:01.004000
Name: timedelta, dtype: timedelta64[ns]
The advantage of using timedelta is that you can now also handle hours greater 23.
I am dealing with a csv file containing a column called startTime, containing times.
When opening this file with Excel, the times appear as AM/PM times in the formula bar, although the timestamps in the column appear improperly formatted:
startTime
16:02.0
17:45.0
18:57.0
20:23.0
When reading this file using pandas' read_csv, I am unable to format these timestamps properly:
import pandas as pd
df = pd.read_csv('example_file.csv')
print(df.startTime)
Simply yields:
0 16:02.0
1 17:45.0
2 18:57.0
3 20:23.0
I first attempted to convert the output Series using pd.to_datetime(df.startTime,format=" %H%M%S") but this yields the following error message:
time data '16:02.0' does not match format ' %H%M%S' (match)
I then tried pd.to_datetime(df.startTime,format=" %I:%M:%S %p") based on this answer, in order to account for the AM/PM convention, but this returned the same error message.
How can I use pandas to format these timestamps like Excel automatically does?
Your csv file has text, not datetime, so you need to first convert text stored in this column to pandas datetime object, then you can convert this pandas datetime object to the kind of format that you want via a strftime method:
pd.to_datetime(df['startTime']).dt.strftime(date_format = '%I:%M:%S %p')
Outputs:
0 04:02:00 PM
1 05:45:00 PM
2 06:57:00 PM
3 08:23:00 PM
Note: these values are string values, not datetime.
Edit for this specific issue:
A quick format to add 00h to your timestamp before converting to get midnight AM:
pd.to_datetime(df['startTime'].apply(lambda x: f'00:{x}')).dt.strftime(date_format = '%I:%M:%S %p')
Outputs:
0 00:16:02 AM
1 00:17:45 AM
2 00:18:57 AM
3 00:20:23 AM
Try:
>>> pd.to_datetime(df['startTime'].str.strip(), format='%H:%M.%S')
0 1900-01-01 16:02:00
1 1900-01-01 17:45:00
2 1900-01-01 18:57:00
3 1900-01-01 20:23:00
Name: startTime, dtype: datetime64[ns]
Coerce to datetetime and extract time using dt.strftime
df['startTime']=pd.to_datetime(df['startTime']).dt.strftime('%I:%M.%S%p')
I have a csv file of a years worth of time series data where the time stamp looks like the code insert below. One thing to mention about the data its a 30 year averaged hourly weather data, so there isnt a year specified with the time stamp.
Date
01-01T01:00:00
01-01T02:00:00
01-01T03:00:00
01-01T04:00:00
01-01T05:00:00
01-01T06:00:00
01-01T07:00:00
01-01T08:00:00
01-01T09:00:00
01-01T10:00:00
01-01T11:00:00
01-01T12:00:00
01-01T13:00:00
01-01T14:00:00
01-01T15:00:00
01-01T16:00:00
01-01T17:00:00
01-01T18:00:00
01-01T19:00:00
01-01T20:00:00
01-01T21:00:00
01-01T22:00:00
01-01T23:00:00
I can read the csv file just fine:
df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=True)
If I do a pd.to_datetime(df) this will error out:
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Would anyone have any tips to convert my df to datetime?
You can pass date_parser argument (check docs), e.g.
import pandas as pd
from datetime import datetime
df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=['Date'],
date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S'))
print(df.head())
output
Empty DataFrame
Columns: []
Index: [1900-01-01 01:00:00, 1900-01-01 02:00:00, 1900-01-01 03:00:00, 1900-01-01 04:00:00, 1900-01-01 05:00:00]
of course you can define different function, maybe specify different year, etc..
e.g. if you want year 2020 instead of 1900 use
date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S').replace(year=2020)
Note I assume it's day-month format, change format string accordingly.
EDIT: Change my example to reflect that Date column should be used as index.
One thing you can do is to append a default year:
pd.to_datetime('2020-' + df['Date'])
I am reading data from a text file with more that 14000 rows and there is a column which has eight (08) digit numbers in it. The format for some of the rows are like:
01021943
02031944
00041945
00001946
The problem is that when I use to_date function it converts the datatype of the date from object to int64 but I want it to be datetime. Second by using the to_datetime function the dates like
00041945 becomes 41945
00001946 becomes 1946 and hence I cannot properly format them
You can add parameter dtype to read_csv for converting column col to string and then use to_datetime with parameters format for specify formatting and errors='coerce' - because bad dates, which are converted to NaT:
import pandas as pd
import io
temp=u"""col
01021943
02031944
00041945
00001946"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp), dtype={'col': 'str'})
df['col'] = pd.to_datetime(df['col'], format='%d%m%Y', errors='coerce')
print (df)
col
0 1943-02-01
1 1944-03-02
2 NaT
3 NaT
print (df.dtypes)
col datetime64[ns]
dtype: object
Thanks Jon Clements for another solution:
import pandas as pd
import io
temp=u"""col_name
01021943
02031944
00041945
00001946"""
#after testing replace io.StringIO(temp) to filename
df = pd.read_csv(io.StringIO(temp),
converters={'col_name': lambda dt: pd.to_datetime(dt, format='%d%m%Y', errors='coerce')})
print (df)
col_name
0 1943-02-01
1 1944-03-02
2 NaT
3 NaT
print (df.dtypes)
col_name datetime64[ns]
dtype: object
As a first guess solution you could just parse it as a string into a datetime instance. Something like:
from datetime import datetime
EXAMPLE = u'01021943'
dt = datetime(int(EXAMPLE[4:]), int(EXAMPLE[2:4]), int(EXAMPLE[:2]))
...not caring very much about performance issues.
import datetime
def to_date(num_str):
return datetime.datetime.strptime(num_str,"%d%m%Y")
Note this will also throw exceptions for zero values because the expected behavior is not clear for this input.
If you want a different behavior for zero values, you can implement it with try & except,
for example, if you want to get None for zero values you can do:
def to_date(num_str):
try:
return datetime.datetime.strptime(num_str,"%d%m%Y")
except ValueError, e:
return None