python time stamp convert to datetime without a year specified - python

I have a csv file of a years worth of time series data where the time stamp looks like the code insert below. One thing to mention about the data its a 30 year averaged hourly weather data, so there isnt a year specified with the time stamp.
Date
01-01T01:00:00
01-01T02:00:00
01-01T03:00:00
01-01T04:00:00
01-01T05:00:00
01-01T06:00:00
01-01T07:00:00
01-01T08:00:00
01-01T09:00:00
01-01T10:00:00
01-01T11:00:00
01-01T12:00:00
01-01T13:00:00
01-01T14:00:00
01-01T15:00:00
01-01T16:00:00
01-01T17:00:00
01-01T18:00:00
01-01T19:00:00
01-01T20:00:00
01-01T21:00:00
01-01T22:00:00
01-01T23:00:00
I can read the csv file just fine:
df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=True)
If I do a pd.to_datetime(df) this will error out:
ValueError: to assemble mappings requires at least that [year, month, day] be specified: [day,month,year] is missing
Would anyone have any tips to convert my df to datetime?

You can pass date_parser argument (check docs), e.g.
import pandas as pd
from datetime import datetime
df = pd.read_csv('weather_cleaned.csv', index_col='Date', parse_dates=['Date'],
date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S'))
print(df.head())
output
Empty DataFrame
Columns: []
Index: [1900-01-01 01:00:00, 1900-01-01 02:00:00, 1900-01-01 03:00:00, 1900-01-01 04:00:00, 1900-01-01 05:00:00]
of course you can define different function, maybe specify different year, etc..
e.g. if you want year 2020 instead of 1900 use
date_parser=lambda x: datetime.strptime(x, '%d-%mT%H:%M:%S').replace(year=2020)
Note I assume it's day-month format, change format string accordingly.
EDIT: Change my example to reflect that Date column should be used as index.

One thing you can do is to append a default year:
pd.to_datetime('2020-' + df['Date'])

Related

How to deal with inconsistent date series in Python?

Inconsistent date formats
As shown in the photo above, the check-in and check-out dates are inconsistent. Whenever I try to clean convert the entire series to datetime using df['Check-in date'] = pd.to_datetime(df['Check-in date'], errors='coerce') and
df['Check-out date'] = pd.to_datetime(df['Check-out date'], errors='coerce') the days and months get mixed up. I don't really know what to do now. I also tried splitting the days months and years and re-arranging them, but I still have no luck.
My goal here is to get the total night stay of our guest but due to the inconsistency, I end up getting negative total night stays.
I'd appreciate any help here. Thanks!
You can try different formats with strptime and return a DateTime object if any of them works.
from datetime import datetime
import pandas as pd
def try_different_formats(value):
only_date_format = "%d/%m/%Y"
date_and_time_format = "%Y-%m-%d %H:%M:%S"
try:
return datetime.strptime(value,only_date_format)
except ValueError:
pass
try:
return datetime.strptime(value,date_and_time_format)
except ValueError:
return pd.NaT
in your example:
df = pd.DataFrame({'Check-in date': ['19/02/2022','2022-02-12 00:00:00']})
Check-in date
0 19/02/2022
1 2022-02-12 00:00:00
apply method will run this function on every value of the Check-in date
column. the result would be a column of DateTime objects.
df['Check-in date'].apply(try_different_formats)
0 2022-02-19
1 2022-02-12
Name: Check-in date, dtype: datetime64[ns]
for a more pandas-specific solution you can check out this answer.

Pandas dt accessor returns wrong day and month

My CSV data looks like this -
Date Time
1/12/2019 12:04AM
1/12/2019 12:09AM
1/12/2019 12:14AM
and so on
And I am trying to read this file using pandas in the following way -
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv',parse_dates=[['Date','Time']])
print(data['Date_Time'].dt.month)
When I try to access the year through the dt accessor the year prints out fine as 2019.
But when I try to print the day or the month it is completely incorrect. In the case of month it starts off as 1 and ends up as 12 when the right value should be 12 all the time.
With the day it starts off as 12 and ends up at 31 when it should start at 1 and end in 31. The file has total of 8867 entries. Where am I going wrong ?
The default format is MM/DD, while yours is DD/MM.
The simplest solution is to set the dayfirst parameter of read_csv:
dayfirst : DD/MM format dates, international and European format (default False)
data = pd.read_csv('D 2019.csv', parse_dates=[['Date', 'Time']], dayfirst=True)
# -------------
>>> data['Date_Time'].dt.month
# 0 12
# 1 12
# 2 12
# Name: Date_Time, dtype: int64
Try assigning format argument of pd.to_datetime
df = pd.read_csv('D 2019.csv')
df["Date_Time"] = pd.to_datetime(df["Date_Time"], format='%d/%m/%Y %H:%M%p')
You need to check the data type of your dataframe and convert the column "Date" into datetime
df["Date"] = pd.to_datetime(df["Date"])
After you can access the day, month, or year using:
dt.day
dt.month
dt.year
Note: Make sure the format of the date (D/M/Y or M/D/Y)
Full Code
import pandas as pd
import numpy as np
data = pd.read_csv('D 2019.csv')
data["Date"] = pd.to_datetime(data["Date"])
print(data["Date"].dt.day)
print(data["Date"].dt.month)
print(data["Date"].dt.year)

Weird Date formats part of column in YY-mm-dd and the rest of columns YY-dd-mm

This is a strange one but I have an original excel with 10/11/2018 and the above problem happens when i convert column to datetime using:
df.Date = pd.to_datetime(df['Date'])
So the date column is 2018-01-11, then the date/months are equal for example 2018-11-11, it swaps the format of previous row and the row is now
''2018-11-12''
''2018-11-13''
ive tried to write a for loop for each entry changing the series but get error cant change series, then i tried writing a loop but get the time error
for date_ in jda.Date:
jda.Date[date_] = jda.Date[date_].strftime('%Y-%m-%d')
KeyError: Timestamp('2019-05-17 00:00:00')
Beow is a pic of where the forat changes
Thank you for your help
Solution if dates are saved like strings:
I think problem is wrong parsed datetimes, because by default are 10/11/2018 parsed to 11.October 2018, so if need parse to 10. November 2018 format add dayfirst=True parameter in to_datetime:
df.Date = pd.to_datetime(df['Date'], dayfirst=True)
Or you can specify format e.g. %d/%m/%Y for DD/MM/YYYY:
df.Date = pd.to_datetime(df['Date'], format='%d/%m/%Y')

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

Python ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S' when dates in csv file are month/day/year

I'm having an issue where the date format is not matching up. Meaning in my .csv file the dates are as follows %m/%d/%Y (ex. 11/3/2001) but in the error it saying %Y/%m/%d or %Y/%d/%m. I've tried all the possible permutations as far as year, month and day and I continue to recieve the same error of ValueError: time data '2001-11-03 ' %Y:%m %d %H:%M:%S'. Below is my code. Thanks.
df = pd.read_excel('.xlsx', header=None)
df.to_csv('.csv', header=None, index=False)
df= pd.read_csv('.csv', index_col[5,8,9,12], date_parser=lambda x: datetime.datetime.strptime(x, '%Y/%m/$d %H:%M:%S').strptime('%m/%d/%Y))
Note: What I'm trying to do is convert an .xlsx file to .csv and then remove the trailing 0:00 from multiple columns within the .csv file. Hope this helps.
Use the parse from dateutil.parser to parse the date appropriately. It is an easy access. The fastest way to parse dates.
from dateutil.parser import parse
df = pd.read_csv('filename.csv', date_parser = parse, index_..)
our you can use to_datetime native to Pandas
pd.to_datetime(df['Date Col'])
In order to format the date properly, you should use the following:
date_parser=lambda x: parse(x)
#parse from dateutil.parser
df['Date Col'] = df['Date Col'].strftime('%m/%d/%Y')
df.to_csv('New File.csv')
You can use to_datetime since you are using pandas. MoreInfo
import pandas as pd
df = pd.DataFrame({"a": ["11/3/2001", '2001-11-03']})
df["a"] = pd.to_datetime(df["a"])
print(df["a"])
Output:
0 2001-11-03
1 2001-11-03
Name: a, dtype: datetime64[ns]

Categories