Adding Multiple Columns - python

How to add multiple columns from one dataframe to another dataframe i laready figured out to add a single column but not getting multiple columns. I am a newbie
df
new['Symbol']= pd.Series(df['Symbol'])
dfnew['Symbol']['Desc']= pd.Series(df['Symbol']['Desc'])

Use:
dfnew['Symbol'],dfnew['Desc']= df['Symbol'],df['Desc']
Or df.assign():
dfnew=dfnew.assign(Symbol=df.Symbol,Desc=df.Desc)
If needed initialize dfnew first as dfnew=pd.DataFrame()

Related

how to get column names in pandas of getdummies

After i created a data frame and make the function get_dummies on my dataframe:
df_final=pd.get_dummies(df,columns=['type'])
I got the new columns that I want and everything is working.
My question is, how can I get the new columns names of the get dummies? my dataframe is dynamic so I can't call is staticly, I wish to save all the new columns names on List.
An option would be:
df_dummy = pd.get_dummies(df, columns=target_cols)
df_dummy.columns.difference(df.columns).tolist()
where df is your original dataframe, df_dummy the output from pd.get_dummies, and target_cols your list of columns to get the dummies.

Find where three separate DataFrames overlap and create a new DataFrame

I have three separate DataFrames. Each DataFrame has the same columns - ['Email', 'Rating']. There are duplicate row values in all three DataFrames for the column Email. I'm trying to find those emails that appear in all three DataFrames and then create a new DataFrame based off those rows. So far I have I had all three DataFrames saved to a list like this dfs = [df1, df2, df3], and then concatenated them together using df = pd.concat(dfs). I tried using groupby from here but to no avail. Any help would be greatly appreciated
You want to do a merge. Similar to a join in sql you can do an inner merge and treat the email like a foreign key. Here is the docs: https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html
It would look something like this:
in_common = pd.merge(df1, df2, on=['Email'], how='inner')
you could try using .isin from pandas, e.g:
df[df['Email'].isin(df2['Email'])]
This would retrieve row entries where the values for the column email are the same in the two dataframes.
Another idea is maybe try an inner merge.
Goodluck, post code next time.

Pandas - Merge DataFrame to Series when all column values are the same.

I just started to using pandas and I would to reduce amount of data that I get by merging my DataFrames in that way:
Load df
Check in which columns all values are the same
Delete other columns
Reduce df to single Series
Return
def merge_df(in_df):
alist = []
for col in in_df.columns:
if len(in_df[col].unique()) == 1:
alist.append(col)
return in_df[alist].T.squeeze()[1]
Is there any more elegent way to do it? E.g. without looping through all columns?
Yeah you can remove duplicate data by pandas simple function.
df.drop_duplicates()
You can refer documentation here.
For removing particular column redundant data you can pass column name as a parameter "subset". It will remove whole row for duplicate data.

Create a dataframe by discarding intersections of two dataframes (Pandas)

Does anyone know of an efficient way to create a new dataframe based off of two dataframes in Python/Pandas?
What I am trying to do is check if a value from df1 is in df2, then do not add the row to df3. I am working with student IDS, and if a student ID from df1 is in df2, I do not want to include it in the new dataframe, df3.
So does anybody know an efficient way to do this? I have googled and looked on SO, but found nothing that works so far.
Assuming the column is called ID.
df3 = df1[~df1["ID"].isin(df2["ID"])].copy()
If you have both dataframes of same length you can also use:
print df1.loc[df1['ID'] != df2['ID']]
assign it to a third dataframe.

Pandas merge DataFrames based on index/column combination

I have two DataFrames that I want to merge. I have read about merging on multiple columns, and preserving the index when merging. My problem needs to cater for both, and I am having difficulty figuring out the best way to do this.
The first DataFrame looks like this
and the second looks like this
I want to merge these based on the Date and the ID. In the first DataFrame the Date is the index and the ID is a column; in the second DataFrame both Date and ID are part of a MultiIndex.
Essentially, as a result I want a DataFrame that looks like DataFrame 2 with an additional column for the Events from DataFrame 1.
I'd suggest reseting the index (reset_index) and then merging the DataFrame, as you've read. Then you can set the index (set_index) to reproduce your desired MultiIndex.

Categories