I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])
Related
My idea is seperate both of the "String" then convert both dataframe into same datetime format. I try the code
data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d')
but there are some error on the output. The 13/02/2020 will become 2020-02-13 that is what i want. But the 12/02/2020 will become 2020-12-02.
My dataframe have 2 type of date format. Which is YYYY-MM-DD and DD/MM/YYYY.
dataframe
I need to split it into 2 dataframe, all the row that have the date YYYY-MM-DD into the df1.
The data type is object.
All all the row that have the date DD/MM/YYYY into the df2.
Anyone know how to code it?
If dont need convert to datetimes use Series.str.contains with boolean indexing:
mask = df['date'].str.contains('-')
df1 = df[mask].copy()
df2 = df[~mask].copy()
If need datetimes you can use parameter errors='coerce' in to_datetime for missing values if not matching format, so last remove missing values:
df1 = (df.assign(date = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
.dropna(subset=['date']))
df2 = (df.assign(date = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
.dropna(subset=['date']))
EDIT: If need output column filled by correct datetimes you can replace missing values by another Series by Series.fillna:
date1 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
df['date'] = date1.fillna(date2)
you can use the fact that the separation is different to find the dates.
If your dataframe is in this format:
df = pd.DataFrame({'id' : [1,1,2,2,3,3],
"Date": ["30/8/2020","30/8/2021","30/8/2022","2019-10-24","2019-10-25","2020-10-24"] })
With either "-" or "/" to separate the data
you can use a function that finds this element and apply it to the date column:
def find(string):
if string.find('/')==2:
return True
else:
return False
df[df['date'].apply(find)]
I need to compare the time between two dates in python. One is given as a string and the other in datetime.datetime format. I have tried a few ideas, but the error is always Cannot compare tz-naive and tz-aware datetime-like objects
Idea 1: Convert the string time into pandas Timestamp. Then reconvert into string. Then convert to isoformat. Then compare new isoformat to datetime.datetime object
from datetime import datetime, timedelta
time_to_compare = datetime.utcnow()-timedelta(minutes=60)
df['Date'] = pd.to_datetime(df['Date'])
df['Date'] = df['Date'].apply(lambda x: str(x))
df['Date'] = df['Date'].apply(lambda x: datetime.fromisoformat(x))
df= df.loc[df['Date']>=time_to_compare]
Idea 2: Change the datetime.datetime object to a Timestamp
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60))
df['Date']=pd.to_datetime(df['Date'])
df= df.loc[df['Date']>=time_to_compare]
Ideally I want to filter the dataframe and say if time_to_compare is less than df['Date'] keep said element in the dataframe.
Use to test:
d = {'Date':['2020-03-12T13:59:15.739Z','2020-02-28T22:22:06.827Z']}
df = pd.DataFrame(data=d)
wih Pandas 1.0.1, you can add utc=True while creating time_to_compare like:
time_to_compare = pd.to_datetime(datetime.utcnow()-timedelta(minutes=60), utc=True)
to make it timezone aware
I could not reproduce, because on my Pandas 0.23, df['Date'] = pd.to_datetime(df['Date']) gives a naive pd.Timestamp column wich can be compared to datetime.utcnow()-timedelta(minutes=60) which is by definition naive.
If your system is able to build df['Date'] as a timezone aware column, you should just build a timezone aware time_to_compare with:
time_to_compare = datetime.now(timezone.utc)-timedelta(minutes=60)
This is a strange one but I have an original excel with 10/11/2018 and the above problem happens when i convert column to datetime using:
df.Date = pd.to_datetime(df['Date'])
So the date column is 2018-01-11, then the date/months are equal for example 2018-11-11, it swaps the format of previous row and the row is now
''2018-11-12''
''2018-11-13''
ive tried to write a for loop for each entry changing the series but get error cant change series, then i tried writing a loop but get the time error
for date_ in jda.Date:
jda.Date[date_] = jda.Date[date_].strftime('%Y-%m-%d')
KeyError: Timestamp('2019-05-17 00:00:00')
Beow is a pic of where the forat changes
Thank you for your help
Solution if dates are saved like strings:
I think problem is wrong parsed datetimes, because by default are 10/11/2018 parsed to 11.October 2018, so if need parse to 10. November 2018 format add dayfirst=True parameter in to_datetime:
df.Date = pd.to_datetime(df['Date'], dayfirst=True)
Or you can specify format e.g. %d/%m/%Y for DD/MM/YYYY:
df.Date = pd.to_datetime(df['Date'], format='%d/%m/%Y')
I have a Pandas dataframe with raw dates formatted as such "19990130". I want to convert these into new columns: 'year', 'month', and 'dayofweek'.
I tried using the following:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values
Which does give me an array of datetime objects. However, the next step I tried was using .to_pydatetime() and then .year to try to get the year out, like this:
pd.to_datetime(df['date'], format='%Y%m%d', errors='ignore').values.to_pydatetime().year
This works when I test a single value, but with a Pandas dataframe. I get:
'numpy.ndarray' object has no attribute 'to_pydatetime'
What's the easiest way to extract the year, month, and day of week from this data?
Try:
s = pd.to_datetime(df['date'], format='%Y%m%d', errors='coerce')
s.dt.year
# or
# s.dt.month, etc
Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]