Split dataframe into two using Data as splitting point - python

I have a dataframe which has 100,000 rows and 24 columns; representing crime over a year period October 2019 - October 2020
I'm trying to split the my df into two one dataframe of all rows ranging from october 1st - 31st March and the second ranging from April 1st - October 31st;
Would anyone be able to kindly assist how using pandas?

Assuming the column is of datetime type. You can do like this :
import pandas as pd
split_data = pd.datetime(2020,03,31)
df_1 = df.loc[df['Date']<= split_date]
df_2 = df.loc[df['Date'] > split_date]
if the column containing date field is not datetime type. You should first convert it into datetime type.
df['Date'] = pd.to_datetime(df['Date'])

Related

How to convert date format (dd/mm/yyyy) to days in python csv

I need a function to count the total number of days in the 'days' column between a start date of 1st Jan 1995 and an end date of 31st Dec 2019 in a dataframe taking leap years into account as well.
Example: 1st Jan 1995 - Day 1, 1st Feb 1995 - Day 32 .......and so on all the way to 31st.
If you want to filter a pandas dataframe using a range of 2 date you can do this by:
start_date = '1995/01/01'
end_date = '1995/02/01'
df = df[ (df['days']>=start_date) & (df['days']<=end_date) ]
and with len(df) you will see the number of rows of the filter dataframe.
Instead, if you want to calculate a range of days between 2 different date you can do without pandas with datetime:
from datetime import datetime
start_date = '1995/01/01'
end_date = '1995/02/01'
delta = datetime.strptime(end_date, '%Y/%m/%d') - datetime.strptime(start_date, '%Y/%m/%d')
print(delta.days)
Output:
31
The only thing is that this not taking into account leap years

Cleaning date column in python with multiple date formats

I am cleaning up a dataframe that has date of birth and date of death as a string. There are multiple formats of dates in those columns. Some contain just year (which is all I need). These are the formats of dates:
Jan 10 2020
1913
10/8/2019
June 14th 1980
All I need is the year from each date. I have not been having any luck with pandas to_datetime since a significant portion of the rows only have year to begin with.
Is there a way for me to pull just year from the strings so that I can get each column to look like:
2020
1913
2019
1980
The simplest way is to use a parser which will accept these and other formats:
import pandas as pd
from dateutil import parser
df = pd.DataFrame({"mydates":["Jan 10 2020", "1913", "10/8/2019", "June 14th 1980"]})
df['years'] = df['mydates'].apply(parser.parse).dt.strftime('%Y')
print(df)
You can use str.extract:
df['BirthDate'] = df['BirthDate'].str.extract(r'\d{4}')

Trying to sort by date in a grouped dataframe in python

I want to produce a dataframe that splits by day (which is the day date of the month) but then orders them by the date. At the moment the code below splits them into dates e.g. 1 - 11, 2 - 11 but the 30 -10 and 31-10 come after all my November dates.
ResultSet2 = ResultProxy2.fetchall()
df2 = pd.DataFrame(ResultSet2)
resultsrecovery = [group[1] for group in df2.groupby(["day"])]
The current code output :
I basically want the grouped dataframe for the 30-10 and 31st of October to come before all the ones in November

How to filter column by date in Python?

Attempting to filter df to only include rows with date before 2018-11-06.
Column is in datetime format. Running this code returns only rows with exact date of 2018-11-06 instead of values less than. Also, when running code with less than symbol '<', only dates later than 2018-11-06 are returned. It appears that I am doing something very incorrectly.
db4=db3[~(db3['registration_dt']>'2018-11-06')]
It seems like you are comparing the string '2018-11-06' with a datetime.
import datetime as dt
# Selects all rows where registration date is after 6 november 2018
df = db3[db3['registration_dt']>dt.datetime(2018,11,6)]
# Selects all rows where registration_dt is before 6 november 2018
df = db3[db3['registration_dt']>dt.datetime(2018,11,6)]
# The ~ symbol can be read as not
# This selects all rows before or equal to 6 november 2018
df = db3[~(db3['registration_dt']>dt.datetime(2018,11,6))]

Separate month from year in Python data frame

I have a 1000 x 6 dimension data frame and one of the columns' header is "Date" where the date is presented in the format "JAN2014", "JUN2002" etc...
I would like to split this column in two separate columns: "Year" and "Month" so JAN will be in "Month" column, 2014 will be in "Year" column etc..
Could anyone please tell me how to do this in Python?
You can use the str accessor and indexing:
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
Example:
df = pd.DataFrame({'Date':['JAN2014','JUN2002']})
df['Month'] = df['Date'].str[:3]
df['Year'] = df['Date'].str[3:]
print(df)
Output:
Date Month Year
0 JAN2014 JAN 2014
1 JUN2002 JUN 2002

Categories