Date not updating correctly into another dataframe in python - python

I'm trying to update a df1 with values from df2. However, a column with date doesn't seem to be updating correctly.
For instance, I have df 1 and df2:
df1:
SN Date_screened DOB
7983 2017-11-30 00:00:00 2011-02-05 00:00:00
df2:
SN Date_screened DOB
7983 2011-02-05 00:00:00
When I try to update df2 with df1 this is what I get:
df2.update(df1)
df2:
SN Date_screened DOB
7983 2017-11-30 00:00:00 1296864000000000000
I'm really not sure why the Date_screened could be updated correctly but for the DOB it returns a long string of numbers after the update rather than returning 2011-02-05 00:00:00
For more context, the DOB values in df1 consists entirely of dates dtype is ('<M8[ns]'). However, the DOB values in df2 consists a mix of dates, blanks, strings etc as the column has not been cleaned yet; it's dtype is ('o').
Any ideas as to why this is happening (I suspect it might be because of df2's DOB column datatype but I'm not too sure as df2's Date_screened column is also read as ('o') ) and how I might be able to rectify this would be greatly appreciated. Thanks so much.

Related

Getting Type Error Trying to create a month-year column from date ranges in pandas

I'm trying to follow the solution provided Find all months between two date columns and generate row for each month and I'm hitting a wall as I'm getting an error. What I want to do is create a Year-Month column for each year-month that exists in the startdate and enddate range for each row. When I tried to follow the above linked Stack, I get the error
TypeError: Cannot convert input ... Name: ServiceStartDate, dtype: datetime64[ns]] of type <class 'pandas.core.series.Series'> to Timestamp
I have no idea how to fix this. Please help!
Sample Data
ID
StartDate
EndDate
1
311566
2021-10-01
2024-09-30
2
235216
2020-11-01
2020-11-30
3
157054
2021-10-01
2023-09-30
4
159954
2021-01-01
2023-12-31
5
255815
2019-11-01
2022-10-31
I have found a solution to my problem (sorry for the long response delay). The problem was that my data had a time stamp associated with it. I needed to change the date field to y/m/-01 format using the following code.
df['date] = df['date'].apply(lambda x: x.strftime('%Y-%m-01'))
Then I used the solution below to get all the months/years that exist between the min and max dates as a single column.
df.merge(df.apply(lambda s: pd.date_range(df['date'].min(),
df['date'].max(), freq='MS'), 1).explode("").rename('Month'),
left_index=True, right_index=True)

python dataframe datetime condition

I am trying to create a new dataframe from an existing one by conditioning holiday datetime. train dataframe is existing and I want to create train_holiday from it by taking day and month values of holiday dataframe, my purpose is similar below:
date values
2015-02-01 10
2015-02-02 20
2015-02-03 30
2015-02-04 40
2015-02-05 50
2015-02-06 60
date
2012-02-02
2012-02-05
now first one is existing, and second dataframe shows holidays. I want to create a new dataframe from first one that only contains 2015 holidays similar below:
date values
2015-02-02 20
2015-02-05 50
I tried
train_holiday = train.loc[train["date"].dt.day== holidays["date"].dt.day]
but it gives error. could you please help me about this?
In your problem you care only the month and the day components, and one way to extract that is by dt.strftime() (ref). Applying that extraction on both date columns and use .isin() to keep month-day in df1 that matches that in df2.
df1[
df1['date'].dt.strftime('%m%d').isin(
df2['date'].dt.strftime('%m%d')
)
]
Make sure both date columns are in date-time format so that .dt can work. For example,
df1['date'] = pd.to_datetime(df1['date'])

sorting datetime columns pandas

I have a dataframe table that has columns containing datetime information.
As you can see from the table below, the 2019-xx field is between the years 2018 and 2016 so I need to arrange it properly.
I tried to use .sort_index(axis=1, inplace=True) but in vain
(I don't know why it has no effect at all).
Dataframe:
2017-12-31 2018-12-31 2019-12-31 2016-12-31 2020-06-30
Unnamed: 0
WaterFlow -26084000.0 -257404000.0 -84066000.0 135075000.0 NaN
trailing1HourWaterFlow NaN NaN -84066000.0 NaN 6823000.0
The problem is that:
I don't know how to arrange columns orders when it's represented as
datetime info.
The table above seems strange since that "Unnamed: 0" row is empty
and there's a space between the columns and rows unlike other
ordinary dataframes.
I think you need convert the columns to datetimes, then do the sorting. If Unnamed: 0 is the index name you can remove it by using DataFrame.rename_axis:
df.columns = pd.to_datetime(df.columns)
df = df.sort_index(axis=1).rename_axis(None)

Add a column for day of the week based on Date INdex

I'm new to the language and have managed to create a dataframe below. it is MultiIndex and is a (a,b) size.
The Date is on the rows, and I'm not fully sure how it is all defined.
I want to add a column that is the day of the week (1,2,3,4,5,6,7) for the days, based on the date stamps on the left/index.
Can someone show me how to do it please, I'm just confused on how to pull the index/date column to do calcs on.
Thanks
print(df_3.iloc[:,0])
Date
2019-06-01 8573.84
2019-06-02 8565.47
2019-06-03 8741.75
2019-06-04 8210.99
2019-06-05 7704.34
2019-09-09 10443.23
2019-09-10 10336.41
2019-09-11 10123.03
2019-09-12 10176.82
2019-09-13 10415.36
Name: (bitcoin, Open), Length: 105, dtype: float64
I've just used two of yours first columns and 3 of your records to get a possible solution. It's pretty much of what Celius did, but with a column conversion to to_datetime.
data = [['2019-06-01', 8573.84], ['2019-06-02', 8565.47], ['2019-06-03', 8741.75]]
df = pd.DataFrame(data,columns = ['Date', 'Bitcoin'])
df['Date']= pd.to_datetime(df['Date']).dt.dayofweek
The output result prints 5 for day 2019-06-01 which is a Saturday, 6 for the 2019-06-02 (Sunday) and 0 for 2019-06-03 (Monday).
I hope it helps you.
If you are using pandas and your Index is interpreted as a Datetime object, I would try the following (I assume Date is your index given the dataframe you provided as example):
df = df.reset_index(drop=False) #Drop the index so you can get a new column named `Date`.
df['day_of_week'] = df['Date'].dt.dayofweek #Create new column using pandas `dt.dayofweek`
Edit: Also possible duplicate of Create a day of week column in a pandas dataframe

comparing date time values in a pandas DataFrame with a specific data_time value and returning the closet one

I have a date column in a pandas DataFrame as follows:
index date_time
1 2013-01-23
2 2014-01-23
3 2015-8-14
4 2015-10-23
5 2016-10-28
I want to compare the values in date_time column with a specific date, for example date_x = 2015-9-14 ad return a date that is before this date and it is the most closet, which is 2015-8-14.
I thought about converting the values in date_time column to a list and then compare them with the specific date. However, I do not think it is an efficient solution.
Any solution?
Thank you.
Here is one way using searchsorted, and all my method is assuming the data already order , if not doing the df=df.sort_values('date_time')
df.date_time=pd.to_datetime(df.date_time)
date_x = '2015-9-14'
idx=np.searchsorted(df.date_time,pd.to_datetime(date_x))
df.date_time.iloc[idx-1]
Out[408]:
2 2015-08-14
Name: date_time, dtype: datetime64[ns]
Or we can do
s=df.date_time-pd.to_datetime(date_x)
df.loc[[s[s.dt.days<0].index[-1]]]
Out[417]:
index date_time
2 3 2015-08-14

Categories