How to split csv into 2 dataframe with the condition - python

My idea is seperate both of the "String" then convert both dataframe into same datetime format. I try the code
data['date'] = pd.to_datetime(data['date'])
data['date'] = data['date'].dt.strftime('%Y-%m-%d')
but there are some error on the output. The 13/02/2020 will become 2020-02-13 that is what i want. But the 12/02/2020 will become 2020-12-02.
My dataframe have 2 type of date format. Which is YYYY-MM-DD and DD/MM/YYYY.
dataframe
I need to split it into 2 dataframe, all the row that have the date YYYY-MM-DD into the df1.
The data type is object.
All all the row that have the date DD/MM/YYYY into the df2.
Anyone know how to code it?

If dont need convert to datetimes use Series.str.contains with boolean indexing:
mask = df['date'].str.contains('-')
df1 = df[mask].copy()
df2 = df[~mask].copy()
If need datetimes you can use parameter errors='coerce' in to_datetime for missing values if not matching format, so last remove missing values:
df1 = (df.assign(date = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
.dropna(subset=['date']))
df2 = (df.assign(date = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
.dropna(subset=['date']))
EDIT: If need output column filled by correct datetimes you can replace missing values by another Series by Series.fillna:
date1 = pd.to_datetime(df['date'], format='%Y-%m-%d', errors='coerce')
date2 = pd.to_datetime(df['date'], format='%d/%m/%Y', errors='coerce')
df['date'] = date1.fillna(date2)

you can use the fact that the separation is different to find the dates.
If your dataframe is in this format:
df = pd.DataFrame({'id' : [1,1,2,2,3,3],
"Date": ["30/8/2020","30/8/2021","30/8/2022","2019-10-24","2019-10-25","2020-10-24"] })
With either "-" or "/" to separate the data
you can use a function that finds this element and apply it to the date column:
def find(string):
if string.find('/')==2:
return True
else:
return False
df[df['date'].apply(find)]

Related

Converting dates to datetime64 results in day and month places getting swapped

I am pulling a time series from a csv file which has dates in "mm/dd/yyyy" format
df = pd.read_csv(lib_file.csv)
df['Date'] = df['Date'].apply(lambda x:datetime.strptime(x,'%m/%d/%Y').strftime('%d/%m/%Y'))
below is the output
I convert dtypes for ['Date'] from object to datetime64
df['Date'] = pd.to_datetime(df['Date'])
but that changes my dates as well
how do I fix it?
Try this:
df['Date'] = pd.to_datetime(df['Date'], infer_datetime_format=True)
This will infer your dates based on the first non-NaN element which is being correctly parsed in your case and will not infer the format for each and every row of the dataframe.
just using the below code helped
df = pd.read_csv(lib_file.csv)
df['Date'] = pd.to_datetime(df['Date])

Pandas mistake while sorting values

Im trying to sort my dataframe based on 'date' and 'hour' columns. Its sorting 01/11/2020 before dates like 24/10/2020.
df = pd.read_csv("some_folder")
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)
In the picture you can see the sorting error.
Try to convert the column date to datetime before sorting (pd.to_datetime):
df = pd.read_csv("some_folder")
df['date'] = pd.to_datetime(df['date'], dayfirst=True) # <-- convert the column to `datetime`
df = df.sort_values(by = ['date','hour']).reset_index(drop=True)

Set new column from datetime on dataframe pandas

I am trying to set a new column(Day of year & Hour)
My date time consist of date and hour, i tried to split it up by using
data['dayofyear'] = data['Date'].dt.dayofyear
and
df['Various', 'Day'] = df.index.dayofyear
df['Various', 'Hour'] = df.index.hour
but it is always returning error, im not sure how i can split this up and get it to a new column.
I think problem is there is no DatetimeIndex, so use to_datetime first and then assign to new columns names:
df.index = pd.to_datetime(df.index)
df['Day'] = df.index.dayofyear
df['Hour'] = df.index.hour
Or use DataFrame.assign:
df.index = pd.to_datetime(df.index)
df = df.assign(Day = df.index.dayofyear, Hour = df.index.hour)

Convert date string YYYY-MM-DD to YYYYMM in pandas

Is there a way in pandas to convert my column date which has the following format '1997-01-31' to '199701', without including any information about the day?
I tried solution of the following form:
df['DATE'] = df['DATE'].apply(lambda x: datetime.strptime(x, '%Y%m'))
but I obtain this error : 'ValueError: time data '1997-01-31' does not match format '%Y%m''
Probably the reason is that I am not including the day in the format. Is there a way better to pass from YYYY-MM_DD format to YYYYMM in pandas?
One way is to convert the date to date time and then use strftime. Just a note that you do lose the datetime functionality of the date
df = pd.DataFrame({'date':['1997-01-31' ]})
df['date'] = pd.to_datetime(df['date'])
df['date'] = df['date'].dt.strftime('%Y%m')
date
0 199701
Might not need to go through the datetime conversion if the data are sufficiently clean (no incorrect strings like 'foo' or '001231'):
df = pd.DataFrame({'date':['1997-01-31', '1997-03-31', '1997-12-18']})
df['date'] = [''.join(x.split('-')[0:2]) for x in df.date]
# date
#0 199701
#1 199703
#2 199712
Or if you have null values:
df['date'] = df.date.str.replace('-', '').str[0:6]

python pandas date convertion to words

I have a particular format of date in my dataframe as
df:
Date
12-Jun-16
22-Jan-12
I want to covert it to this format
df:
Date
12-Jan-2015
Any help as to how to do it?
I think you need convert column to_datetime and then if need change format add strftime:
df.Date = pd.to_datetime(df.Date).dt.strftime('%d-%b-%Y')
print (df)
Date
0 12-Jun-2016
1 22-Jan-2012

Categories