Pandas create column based on index condition [duplicate] - python

This question already has an answer here:
Am I doing a df.Merge wrong?
(1 answer)
Closed 1 year ago.
I have two different dataframe:
df1
2003 rows × 1 columns index column is RecordDate
df2:
(927 rows × 1 columns, index column is RecordDate)
I'd like to create a new column in df1 condition with: if df1's RecordDate and df2's RecordDate match set DailyMoneyDeposit 's value on that row otherwise set that value to zero
df1['MoneyDeposited] = df2['MoneyDeposited']
I can't basically do this because df1 is daily basis date in the other hand df2 only consists of day that investors deposit their money and df1's index row amount is 2003 and df2's is 927
Desired dataframe:
RecordDate
ActiveAccounts
MoneyDeposited
2013-07-05
1
9000.00
2013-07-06
1
0
.
.
RecordDate
ActiveAccounts
MoneyDeposited
2013-11-06
500
6190.00
2013-11-07
500
0

pd.merge(left=df1, right=df2, how='left', left_on='RecordDate', right_on='RecordDate')

Related

create a scoring value based on filtering in pandas

Imagine we have 2 dataframes. Df1 has 1000 rows and 12 columns and has a projection of a student for the next 12 months. The other dataframe (1000 by 15) has the credentials of all of our students. What I want is to implement the following logic:
group the students by id and country
if in df2 the column "test" has a value "not given" then give a score of 0 for all 12 months in df1
if in df2 the column "test" has a value "given" then i should count the "given" per id and country (the group by in step 1.)
I have done the following:
# setting the filters
given = df2['test'] == 'not given'
not given = df2['test'] != 'not given'
#give the score 0 in all columns of df1 based on the filtering of df2
df1.loc[given] = 0
# try to find the number of students that have the same id and country based on the groupby on 1st step
temp_df2 = df2[not given]
groups = pd.DataFrame(temp_df2.groupby(["name", "country"]).size())
When I do this I create a dataframe that has as index the "name" and "country" separated by "/". Now I do the following
groups.reset_index(level=0, inplace=True)
groups.reset_index(level=0, inplace=True)
groups.rename(columns = {0:'counts'}, inplace = True)
Now I have a dataframe of the following form
groups
name country counts
Alex Japan 2
George Italy 1
Now I want to find the values in df2 that have the same name and country and in the corresponding rows in df1 set 10 divided by the value of "counts" in groups dataframe. For example
I have to look in df2 for the rows that have the name Alex and country Japan and
then in df1 I should do 10/2 thus insert 5 in all 12 columns for the rows that I have found in step 1
However, I am not quite sure how to do this matching so I can make the division

Is there a pandas function to add in value of a column based on the other dataframe? [duplicate]

This question already has answers here:
Pandas: how to merge two dataframes on a column by keeping the information of the first one?
(4 answers)
Closed 6 months ago.
I would like to add a column to a pandas dataframe based on the value from another dataframe. Here is table 1 and table 2. I would like to update the duration for table 1 based on the value from table 2. Eg, row 1 in Table 1 is Potato, so the duration should be updated to 30 based on value from table 2.
Table 1
Crops
Entry Time
Duration
Potato
2022-03-01
0
Cabbage
2022-03-02
0
Tomato
2022-03-03
0
Potato
2022-03-0
0
Table 2
Crops
Duration
Potato
30
Cabbage
20
Tomato
25
Thanks.
Just use merge method:
df = df1.merge(df2, on='Crops', how='left')
Before doing that I suggest to drop the duration column in the first dataframe (df1).
The parameter 'on' defines on which column you want to merge (also called 'key') and how='left' it returns a dataframe with the length of the first dataframe. Imposing 'left' avoids that records with 'vegetables'in df1 that are not present in df2 will be deleted.
Google 'difference between inner, left, right and outer join'.

Merge dataframes based on column values with duplicated rows

I want to merge two dataframes based on equal column values. The problem is that one of my columns have duplicated row values, which cannot be drop since it's correlated to another columns. Here's an example of my two dataframes:
Essentialy, I want to merge this two dataframes based on equal values of FromPatchID (df1) and Id (df2) columns, in order to get something like this:
FromPatchID ToPatchID ... Id MMM LB
1 1 ... 1 26.67 27.67
1 2 ... 1 26.67 27.67
1 3 ... 1 26.67 27.67
2 1 ... 2 26.50 27.50
3 1 ... 3 26.63 27.63
I already tried a simple merge with df_merged = pd.merge(df1, df2, on=['FromPatchID','Id']), but I got KeyError indicating to check for duplicates in FromPatchID column.
You have to specify the different column names to match on with left_on and right_on. Also specify how='right' to use only keys from the right frame.
df_merged = pd.merge(df1, df2, left_on='FromPatchID', right_on='Id', how='right')

How to fix finding same dates in two columns and join rows of two dataframes according to same date [duplicate]

This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I am trying to combine two dataframes, i have date column in df1 and date1 column in df2. i want to compare first value of df1 column date to all value of df2 date1 column if similar value found in date2 just combine similar value row of df2 to the first row of df1 date. then do same for second value of df1 date column and so on.... if value found more than one time add rows more than one
I already tried for loop and if conditions but i am getting very strange result, many rows with NAN ,and dataframe rows increase
all_df=pd.DataFrame()
df1=pd.read_csv('.csv')
df2=pd.read_csv('.csv')
for i in range(len(df1)):
for j in range(len(df2)):
if df1['date1'].iloc[i] == df2['date'].iloc[j]:
print('yes')
df=pd.concat([df1.iloc[[i]],df2.iloc[[j]]],axis=1)
all_df=all_df.append(df)
else:
print('no')
i only want rows where df1 date and df2 date2 is same.
https://pandas.pydata.org/pandas-docs/stable/merging.html#database-style-dataframe-joining-merging
import pandas as pd
all_df = pd.merge(df1, df2, left_on='date1', right_on='date')

merging two dataframes while moving column positions [duplicate]

This question already has an answer here:
Merge DataFrames based on index columns [duplicate]
(1 answer)
Closed 4 years ago.
I have a dataframe called df1 that is:
0
103773708 68.50
103773718 57.01
103773730 30.80
103773739 67.62
I have another one called df2 that is:
0
103773739 37.02
103773708 30.25
103773730 15.50
103773718 60.54
105496332 20.00
I'm wondering how I would get them to combine to end up looking like df3:
0 1
103773708 30.25 68.50
103773718 60.54 57.01
103773730 15.50 30.80
103773739 37.02 67.62
105496332 20.00 00.00
As you can see sometimes the index position is not the same, so it has to append the data to the same index. The goal is to append column 0 from df1, into df2 while pushing column 0 in df2 over one.
result = df1.join(df2.rename(columns={0:1})).fillna(0)
Simply merge on index, and then relabel the columns:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='outer')
df.columns = [0,1]
df = df.fillna(0)
df1.columns = ['1'] # Rename the column from '0' to '1'. I assume names as strings.
df=df2.join(df1).fillna(0) # Join by default is LEFT
df
0 1
103773739 37.02 67.20
103773708 30.25 68.50
103773730 15.50 30.80
103773718 60.54 57.01
105496332 20.00 0.00

Categories