Python: How to remove rows which are in other dataframe?

Python: How to remove rows which are in other dataframe? - python

I have two dataframes:
First:
tif_pobrany
0 65926_504019_N-33-127-B-d-3-4.tif
1 65926_504618_N-33-139-D-b-1-3.tif
2 65926_504670_N-33-140-A-a-2-3.tif
3 66533_595038_N-33-79-C-b-3-3.tif
4 66533_595135_N-33-79-D-d-3-4.tif
Second:
url godlo ... row_num nazwa_tifa
0 https://opendata.geoportal.gov.pl/ortofotomapa... M-34-68-C-a-1-2 ... 48004 73231_904142_M-34-68-C-a-1-2.tif
1 https://opendata.geoportal.gov.pl/ortofotomapa... M-34-68-C-a-3-1 ... 48011 73231_904127_M-34-68-C-a-3-1.tif
2 https://opendata.geoportal.gov.pl/ortofotomapa... M-34-68-C-a-3-2 ... 48012 73231_904336_M-34-68-C-a-3-2.tif
3 https://opendata.geoportal.gov.pl/ortofotomapa... M-34-68-C-a-3-3 ... 48013 73231_904286_M-34-68-C-a-3-3.tif
4 https://opendata.geoportal.gov.pl/ortofotomapa... M-34-68-C-a-4-2 ... 48016 73231_904263_M-34-68-C-a-4-2.tif
How can I delete rows in second dataframe which have the same 'nazwa_tifa' like in the first dataframe 'tif_pobrany'?
Something like this:
for index, row in second.iterrows():
for index2, row2 in first.iterrows():
if row['nazwa_tifa'] == row2['tif_pobrany']:
del row
but it didn't work.

Try this with your data:
import pandas as pd
df1 = pd.DataFrame({"col1":[1,2,3,4,5]})
df2 = pd.DataFrame({"col2":[1,3,4,9,8]})
df1.drop(df1[df1.col1.isin(df2.col2)].index, inplace = True)
print(df1)
output:
col1
1 2
4 5

considering df1 and df2 are names of your dataframes respectively:
df2 = df2[df2['nazwa_tifa'].isin(df1['tif_pobrany'])]
How it works?
The isin function is used to check whether values inside a Pandas series are present in another Pandas series.
Then, an array of True/False values are passed to df2, so it selects only rows wherever the condition is True.
Finally, an assignment is used to replace df2 with the new dataframe.

Related

Stick the columns based on the one columns keeping ids

I have a DataFrame with 100 columns (however I provide only three columns here) and I want to build a new DataFrame with two columns. Here is the DataFrame:
import pandas as pd
df = pd.DataFrame()
df ['id'] = [1,2,3]
df ['c1'] = [1,5,1]
df ['c2'] = [-1,6,5]
df
I want to stick the values of all columns for each id and put them in one columns. For example, for id=1 I want to stick 2, 3 in one column. Here is the DataFrame that I want.
Note: df.melt does not solve my question. Since I want to have the ids also.
Note2: I already use the stack and reset_index, and it can not help.
df = df.stack().reset_index()
df.columns = ['id','c']
df

You could first set_index with "id"; then stack + reset_index:
out = (df.set_index('id').stack()
.droplevel(1).reset_index(name='c'))
Output:
id c
0 1 1
1 1 -1
2 2 5
3 2 6
4 3 1
5 3 5

Pandas find the first occurrence of a specific value in a row within multiple columns and return column index

For a dataframe:
df = pd.DataFrame({"A":[0,0],"B":[0,1],"C":[1,2],"D":[2,2]})
How to obtain the column name or column index when the value is 2 or a certain value
and put it in a new column at df, say df["TAG"]
df = pd.DataFrame({"A":[0,0],"B":[0,1],"C":[1,2],"D":[2,2],"TAG":[D,C]})
i tried
df["TAG"]=np.where(df[cols]>=2,df.columns,'')
where [cols] is the list of df columns
So far i can only find how to find row index when matching a value in Pandas
In excel we can do some approach using MATCH(TRUE,INDEX($A:$D>=2,0),) and apply to multiple rows
Any help or hints are appreciated
Thank you so much in advance

Try idxmax:
>>> df['TAG'] = df.ge(2).T.idxmax()
>>> df
A B C D TAG
0 0 0 1 2 D
1 0 1 2 2 C
>>>

Remove rows of one Dataframe based on one column of another dataframe

I got two DataFrame and want remove rows in df1 where we have same value in column 'a' in df2. Moreover one common value in df2 will only remove one row.
df1 = pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2 = pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
result = pd.DataFrame({'a':[1,1,3,4],'b':[1,2,4,6],'c':[6,5,3,1]})

Use Series.isin + Series.duplicated to create a boolean mask and use this mask to filter the rows from df1:
m = df1['a'].isin(df2['a']) & ~df1['a'].duplicated()
df = df1[~m]
Result:
print(df)
a b c
0 1 1 6
1 1 2 5
3 3 4 3
5 4 6 1

Try This:
import pandas as pd
df1=pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2=pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
df2a = df2['a'].tolist()
def remove_df2_dup(x):
if x in df2a:
df2a.remove(x)
return False
return True
df1[df1.a.apply(remove_df2_dup)]
It creates a list from df2['a'], then checks that list against each value of df1['a'], removing values from the list each time there's a match in df1

try this
df1=pd.DataFrame({'a':[1,1,2,3,4,4],'b':[1,2,3,4,5,6],'c':[6,5,4,3,2,1]})
df2=pd.DataFrame({'a':[2,4,2],'b':[1,2,3],'c':[6,5,4]})
for x in df2.a:
if x in df1.a:
df1.drop(df1[df1.a==x].index[0], inplace=True)
print(df1)

how to get dataframe column index by cell value

i have a dataframe look like this:
enter image description here
how can i get the column index base on the cell value exist .xml which is 4 in my dataframe ?
Here is my code:
df = fileserver
for index in df:
df1 = df[index].str.contains(".xml")
print(df1)
for the result True, i don't know how to find the index, please help and thank you everyone

You may track which column does indeed contain a pattern .xml using the following code. For the example below, the column of interest has index 2.
First, create a sample dataframe for your example.
import pandas as pd
>>> df = pd.DataFrame([[1,2,"3.xml"], [4,5,"6.xml"]])
>>> df
0 1 2
0 1 2 3.xml
1 4 5 6.xml
Now, check for your pattern among all cells of the dataframe.
>>> df_filtered = df.applymap(lambda cell: str(cell).endswith(".xml"))
>>> df_filtered
0 1 2
0 False False True
1 False False True
Finally, keep columns if at least one cell was found with the desired pattern.
>>> [column for column, count in df_filtered.sum().to_dict().items()
if count > 0]
[2]

Create new row if column value exists

I have a pandas dataframe that looks like this:
I would like to iterate through column 3 and if an element exists, add a new row to the dataframe, using the value in column 3 as the new value in column 2, while also using the data in columns 0 and 1 from the row where it was found as the values for columns 0 and 1 in the newly added row:
Here, row 2 is the newly added row. The values in columns 0 and 1 in this row come from the row where "D" was found, and now column 2 of the new row contains the value from column 3 in the first row, "D".
Here is one way to do it, but surely there must be a more general solution, especially if I wish to scan more than a single column:
a = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
b = a.copy()
for tu in a.itertuples(index=False): # Iterate by row
if tu[3]: # If exists
b = b.append([[tu[0],tu[1],tu[3]]], ignore_index=True) # Append with new row using correct tuple elements.

You can do this without any loops by creating a new df with the columns you want and appending it to the original.
import pandas as pd
import numpy as np
df = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
ndf = df[pd.notnull(df[3])][[0,1,3]]
ndf.columns = [0,1,2]
df = df.append(ndf, ignore_index=True)
This will leave NaN for the new missing values which you can change then change to None.
df[3] = df[3].where((pd.notnull(df[3])), None)
prints
0 1 2 3
0 A B C D
1 1 2 C None
2 A B D None

This may be a bit more general (assuming your columns are integers and that you are always looking to fill the previous columns in this pattern)
import pandas as pd
def append_rows(scan_row,scanned_dataframe):
new_df = pd.DataFrame()
for i,row in scanned_dataframe.iterrows():
if row[scan_row]:
new_row = [row[i] for i in range(scan_row -1)]
new_row.append(row[scan_row])
print new_row
new_df = new_df.append([new_row],ignore_index=True)
return new_df
a = pd.DataFrame([['A','B','C','D'],[1,2,'C']])
b = a.copy()
b = b.append(append_rows(3,a))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: How to remove rows which are in other dataframe? - python

Try this with your data: import pandas as pd df1 = pd.DataFrame({"col1":[1,2,3,4,5]}) df2 = pd.DataFrame({"col2":[1,3,4,9,8]}) df1.drop(df1[df1.col1.isin(df2.col2)].index, inplace = True) print(df1) output: col1 1 2 4 5

Related

Stick the columns based on the one columns keeping ids

Pandas find the first occurrence of a specific value in a row within multiple columns and return column index

Remove rows of one Dataframe based on one column of another dataframe

how to get dataframe column index by cell value

Create new row if column value exists

Categories

Resources