Dropping rows with same values from another dataframe - python

I have one dataframe (df) with a column called "id". I have another dataframe (df2) with only one column called "id". I want to drop the rows in df that have the same values in "id" as df2.
How would I go about doing this?

use boolean indexing with the isin method.
Note that the tilde ~ indicates that I take the negation of the boolean series returned by df['id'].isin(df2['id'])
df[~df['id'].isin(df2['id'])]
query
Using a query string we refer df2 using the # symbol.
df.query('id not in #df2.id')

Related

DataFrame Pandas condition over a column

Dear fellows I´ve difficulties by performing a condition over a column in my DataFrame, i want to iterate over the column and extract only the values that starts with the number 6, the values from that column are floats.
The columns is called "Vendor".
This is my Dataframe, and I want to sum the values from the column "Amount in loc.curr.2" only for the values from column "Vendor" starts with 6.
This is what I´ve been traying
Also this
idx = df_spend['Vendor'].apply(lambda x: str(x).startswith('6'))
This should create a Boolean pandas.Series that you can use as an index.
summed_col=df_spend.loc[idx,"Amount in loc.curr.2"].apply(sum)
summed_col contains the sum of the column
Definitely take a look at the pandas documentation for the apply function: http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html
Hope this works! :)

Replacing dataframe column list values with values from another dataframe

I am trying to replace data in one of the dataframe while comparing different columns with another data frame with values like below.
I need to map the 'members' column in df1 with 'uid' column in df2 and get the corresponding ipv4-address for one member.
Dataframe 1:
uid
members
type
42
afea136c-217f-4b1d-857c-dc4075bxxxxx
[08xx-b8xx- 4bcf-8axx-5f86xxxxxx, 64xx5c4..
group
Dataframe 2:
uid
name
ipv4-address
type
506
08xx-b8xx- 4bcf-8axx-5f86xxxxxx
l_re-exx-xx-xx-19.172.211.0m23
19.172.211.0
network
589
64xx5c4..
l_re-exx-xx-xx-19.172.211.0m23
19.152.210.0
network
is it possible to replace the members column values or jusr create a new column in df1 with ipv4-addresses from df2?
expected outcome:
uid
members
type
42
afea136c-217f-4b1d-857c-dc4075bxxxxx
[19.172.211.0, 19.152.210.0,..]
group
If you are filtering just to the rows you need on df1 and df2, you can do
ips = df2['ipv4-address'].tolist()
and then set
df1['members'] = ips
otherwise you'll have to use a little more logic to get the right rows to update
Let us try explode then map
df1['new'] = df1.member.explode().map(df2.set_index('name')['ipv4-address']).groupby(level=0).agg(list)

Averaging data of dataframe columns based on redundancy of another column

I want to average the data of one column in a pandas dataframe is they share the same 'id' which is stored in another column in the same dataframe. To make it simple i have:
and i want:
Were is clear that 'nx' and 'ny' columns' elements have been averaged if for them the value of 'nodes' was the same. The column 'maille' on the other hand has to remain untouched.
I'm trying with groupby but couldn't manage till now to keep the column 'maille' as it is.
Any idea?
Use GroupBy.transform with specify columns names in list for aggregates and assign back:
cols = ['nx','ny']
df[cols] = df.groupby('nodes')[cols].transform('mean')
print (df)
Another idea with DataFrame.update:
df.update(df.groupby('nodes')[cols].transform('mean'))
print (df)

How to create a dataframe with the column included in groupby clause?

I have a data frame. It has 3 columns A, Amount. I have done a group by using 'A'. Now I want to insert these values into a new data frame how can I achieve this?
top_plt=pd.DataFrame(top_plt.groupby('A')['Amount'].sum())
The resulting dataframe contains only the Amount column but the groupby 'A' column is missing.
Example:
Result:
DataFrame constructor is not necessary, better is add as_index=False to groupby:
top_plt= top_plt.groupby('A', as_index=False)['Amount'].sum()
Or add DataFrame.reset_index:
top_plt= top_plt.groupby('A')['Amount'].sum().reset_index()

Using .loc with a column of values

Suppose I have a slice of a column within a dataframe df where I want to replace float values with other float values. Only the values to replace are from another dataframe, `newdf.
I've tried using
df.loc[row index condition, [column to replace vals]] = newdf[column]
but for some reason the resulting values are all NaN. Why is this so?
The value from newdf need to align with the index of df. If newdf has the exact number of values you want to insert, you can try using .values:
df.loc[row index condition, [column to replace vals]] = newdf[column].values

Categories