Join 2 dataframes and create Parent Child Relationship?

Join 2 dataframes and create Parent Child Relationship? - python

i have 2 dataframes parent and child, i want to concatenate both in groupby manner
df_parent
parent parent_value
0 Super Sun 0
1 Alpha Mars 4
2 Pluto 9
df_child
child value
0 Planet Sun 100
1 one Sun direction 101
2 Ice Pluto Tune 101
3 Life on Mars 99
4 Mars Robot 105
5 Sun Twins 200
I want the ouput to be in order order = ['Sun', 'Pluto', 'Mars']
Sun
-childs
Pluto
-childs
Mards
-childs
I want to find the child with keyword wise, refer parent_dict
parent_dict = {'Super Sun': 'Sun',
'Alpha Mars': 'Mars',
'Pluto': 'Pluto'}
expected output
child value
0 Super Sun 0 # parent
1 Planet Sun 100 # child
2 one Sun direction 101 # child
3 Sun Twins 200 # child
4 Pluto 9 # parent
5 Ice Pluto Tune 101 # child
6 Alpha Mars 4 # parent
7 Life on Mars 99 # child
8 Mars Robot 105 # child
So far i have tried to iterate master list and both dfs, but expected output is not coming, here is my code
output_df = pd.DataFrame()
for o in order:
key = o
for j, row in df_parent.iterrows():
if key in row[0]:
output_df.at[j, 'parent'] = key
output_df.at[j, 'value'] = row[1]
for k, row1 in df_child.iterrows():
if key in row1[0]:
output_df.at[j, 'parent'] = key
output_df.at[j, 'value'] = row[1]
print(output_df)
Output:
parent value
0 Sun 0.0
2 Pluto 9.0
1 Mars 4.0

You can use append with both dataframe after some preparation. First create a column keyword in both df_parent and df_child used for sorting later. To do so, you an use np.select such as:
import pandas as pd
order = ['Sun', 'Pluto', 'Mars']
condlist_parent = [df_parent['parent'].str.contains(word) for word in order]
df_parent['keyword'] = pd.np.select(condlist = condlist_parent, choicelist = order, default = None)
condlist_child = [df_child['child'].str.contains(word) for word in order]
df_child['keyword'] = pd.np.select(condlist = condlist_child, choicelist = order, default = None)
giving for example for df_parent:
parent parent_value keyword
0 Super Sun 0 Sun
1 Alpha Mars 4 Mars
2 Pluto 9 Pluto
Now you can use append and also Categorical to order the dataframe according to the list order. The rename is used to fit your expected output and for the append working as wanted (columns should have the same name in both dataframe).
df_all = (df_parent.rename(columns={'parent':'child','parent_value':'value'})
.append(df_child,ignore_index=True))
# to order the column keyword with the list order
df_all['keyword'] = pd.Categorical(df_all['keyword'], ordered=True, categories=order)
# now sort_values by the column keyword, reset_index and drop the column keyword
df_output = (df_all.sort_values('keyword')
.reset_index(drop=True).drop('keyword',1)) # last two methods are for cosmetic
The output is then:
child value
0 Super Sun 0
1 Planet Sun 100
2 one Sun direction 101
3 Sun Twins 200
4 Pluto 9
5 Ice Pluto Tune 101
6 Alpha Mars 4
7 Life on Mars 99
8 Mars Robot 105
Note: The fact that the parents are before childs after sorting on 'keyword' is that df_child is appened to df_parent, and not in the reverse.

Here is one solution, by iterating both dataframes, but this seems a very very long procedure
output_df = pd.DataFrame()
c = 0
for o in order:
key = o
for j, row in df_parent.iterrows():
if key in row[0]:
output_df.at[c, 'parent'] = row[0]
output_df.at[c, 'value'] = row[1]
c += 1
for k, row1 in df_child.iterrows():
if key in row1[0]:
output_df.at[c, 'parent'] = row1[0]
output_df.at[c, 'value'] = row1[1]
c += 1
Output:
parent value
0 Super Sun 0.0
1 Planet Sun 100.0
2 one Sun direction 101.0
3 Sun Twins 200.0
4 Pluto 9.0
5 Ice Pluto Tune 101.0
6 Alpha Mars 4.0
7 Life on Mars 99.0
8 Mars Robot 105.0

Consider concatenating both dataframes and order by a keyword find:
order = ['Sun', 'Pluto', 'Mars']
def find_keyword(str_param):
output = None
# ITERATE THROUGH LIST AND RETURN MATCHING POSITION
for i,v in enumerate(order):
if v in str_param:
output = i
return output
# RENAME COLS AND CONCAT DFs
df_combined = pd.concat([df_parent.rename(columns={'parent':'item', 'parent_value':'value'}),
df_child.rename(columns={'child':'item'})],
ignore_index=True)
# CREATE KEYWORD COL WITH DEFINED FUNCTION
df_combined['keyword'] = df_combined['item'].apply(find_keyword)
# SORT BY KEYWORD AND DROP HELPER COL
df_combined = df_combined.sort_values(['keyword', 'value'])\
.drop(columns=['keyword']).reset_index(drop=True)
print(df_combined)
# item value
# 0 Super Sun 0
# 1 Planet Sun 100
# 2 one Sun direction 101
# 3 Sun Twins 200
# 4 Pluto 9
# 5 Ice Pluto Tune 101
# 6 Alpha Mars 4
# 7 Life on Mars 99
# 8 Mars Robot 105

Related

Pandas filter using multiple conditions and ignore entries that contain duplicates of substring

I have a dataframe derived from a massive list of market tickers from a crypto exchange.
The list includes ALL combos yet I only need the tickers that are vs USD stablecoins.
The 1st 15 entries of the original dataframe...
Asset Price
0 1INCHBTC 0.00009650
1 1INCHBUSD 5.74340000
2 1INCHUSDT 5.74050000
3 AAVEBKRW 164167.00000000
4 AAVEBNB 0.77600000
5 AAVEBTC 0.00615200
6 AAVEBUSD 365.00200000
7 AAVEDOWNUSDT 2.02505200
8 AAVEETH 0.17212000
9 AAVEUPUSDT 81.89500000
10 AAVEUSDT 365.57600000
11 ACMBTC 0.00018420
12 ACMBUSD 10.91700000
13 ACMUSDT 10.89500000
14 ADAAUD 1.59600000
Now...there are many USD stablecoins, however not every ticker has a pair with one.
So I used the most popular ones in order to make sure every asset has at least one match.
df = df.loc[(df.Asset.str[-3:] == 'DAI')|
(df.Asset.str[-4:] == 'USDT')|
(df.Asset.str[-4:] == 'BUSD')|
(df.Asset.str[-4:] == 'TUSD')]
The 1st 15 entries of the new but 'messy' dataframe...
Asset Price
0 1INCHBUSD 5.74340000
1 1INCHUSDT 5.74050000
2 AAVEBUSD 365.00200000
3 AAVEDOWNUSDT 2.02505200
4 AAVEUPUSDT 81.89500000
5 AAVEUSDT 365.57600000
6 ACMBUSD 10.91700000
7 ACMUSDT 10.89500000
8 ADABUSD 1.21439000
9 ADADOWNUSDT 3.46482700
10 ADATUSD 1.21284000
11 ADAUPUSDT 76.12900000
12 ADAUSDT 1.21394000
13 AERGOBUSD 0.43012000
14 AIONBUSD 0.07210000
How do i filter/merge entries in this dataframe so that it removes duplicates?
I also need the substring to be removed at the end, so I'm left with just the asset and the USD price.
It should look something like this...
Asset Price
0 1INCH 5.74340000
2 AAVE 365.00200000
3 AAVEDOWN 2.02505200
4 AAVEUP 81.89500000
6 ACM 10.91700000
8 ADA 1.21439000
9 ADADOWN 3.46482700
11 ADAUP 76.12900000
13 AERGO 0.43012000
14 AION 0.07210000
This is for a portfolio tracker.
Also if there is a better way to do this without the middle step I'm all ears.

According your expected output, you want to remove duplicates but keep first item:
df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.drop_duplicates(subset="Asset", keep="first")
print(df)
Prints:
Asset Price
0 1INCH 5.743400
2 AAVE 365.002000
3 AAVEDOWN 2.025052
4 AAVEUP 81.895000
6 ACM 10.917000
8 ADA 1.214390
9 ADADOWN 3.464827
11 ADAUP 76.129000
13 AERGO 0.430120
14 AION 0.072100
EDIT: To group and average:
df.Asset = df.Asset.str.replace(r"(DAI|USDT|BUSD|TUSD)$", "")
df = df.groupby("Asset")["Price"].mean().reset_index()
print(df)
Prints:
Asset Price
0 1INCH 5.741950
1 AAVE 365.289000
2 AAVEDOWN 2.025052
3 AAVEUP 81.895000
4 ACM 10.906000
5 ADA 1.213723
6 ADADOWN 3.464827
7 ADAUP 76.129000
8 AERGO 0.430120
9 AION 0.072100

Just do
con1 = df.Asset.str[-3:] == 'DAI'
con2 = df.Asset.str[-4:] == 'USDT'
con3 = df.Asset.str[-4:] == 'BUSD'
con4 = df.Asset.str[-4:] == 'TUSD'
df['new'] = np.select(['con1','con2','con3','con4'],
['DAI','USDT','BUSD','TUSD'])
out = df[con1 | con2 | con3 | con4].groupby('new').head(1)
or
df[con1 | con2 | con3 | con4].drop_duplicates('new')

Getting a value out of pandas dataframe based on a set of conditions

I have a dataframe as shown below
Token Label StartID EndID
0 Germany Country 0 2
1 Berlin Capital 6 9
2 Frankfurt City 15 18
3 four million Number 21 24
4 Sweden Country 26 27
5 United Kingdom Country 32 34
6 ten million Number 40 45
7 London Capital 50 55
I am trying to get row based on certain condition, i.e. associate the label Number to closest capital i.e. Berlin
3 four million Number 21 24 - > 1 Berlin Capital 6 9
or something like:
df[row3] -> df [row1]
A pseudo logic
First check, for the rows with label: Number then (assumption is that the city is always '2 rows' above or below) and has the label: Capital. But, label: 'capital' loc is always after the label: Country
What I have done until now,
columnsName =['Token', 'Label', 'StartID', 'EndID']
df = pd.read_csv('resources/testcsv.csv', index_col= 0, skip_blank_lines=True, header=0)
print(df)
key_number = 'Number'
df_with_number = (df[df['Label'].str.lower().str.contains(r"\b{}\b".format(key_number), regex=True, case=False)])
print(df_with_number)
key_capital = 'Capital'
df_with_capitals = (df[df['Label'].str.lower().str.contains(r"\b{}\b".format(key_capital), regex=True, case=False)])
print(df_with_capitals)
key_country = 'Country'
df_with_country = (df[df[1].str.lower().str.contains(r"\b{}\b".format(key_country), regex=True, case=False)])
print(df_with_country)
The logic is to compare the index's and then make possible relations
i.e.
df[row3] -> [ df [row1], df[row7]]

you could use merge_asof with the parameter direction=nearest for example:
df_nb_cap = pd.merge_asof(df_with_number.reset_index(),
df_with_capitals.reset_index(),
on='index',
suffixes=('_nb', '_cap'), direction='nearest')
print (df_nb_cap)
index Token_nb Label_nb StartID_nb EndID_nb Token_cap Label_cap \
0 3 four_million Number 21 24 Berlin Capital
1 6 ten_million Number 40 45 London Capital
StartID_cap EndID_cap
0 6 9
1 50 55

# adjusted sample data
s = """Token,Label,StartID,EndID
Germany,Country,0,2
Berlin,Capital,6,9
Frankfurt,City,15,18
four million,Number,21,24
Sweden,Country,26,27
United Kingdom,Country,32,34
ten million,Number,40,45
London,Capital,50,55
ten million,Number,40,45
ten million,Number,40,45"""
df = pd.read_csv(StringIO(s))
# create a mask for number where capital is 2 above or below
# and where country is three above number or one below number
mask = (df['Label'] == 'Number') & (((df['Label'].shift(2) == 'Capital') |
(df['Label'].shift(-2) == 'Capital')) &
(df['Label'].shift(3) == 'Country') |
(df['Label'].shift(-1) == 'Country'))
# create a mask for capital where number is 2 above or below
# and where country is one above capital
mask2 = (df['Label'] == 'Capital') & (((df['Label'].shift(2) == 'Number') |
(df['Label'].shift(-2) == 'Number')) &
(df['Label'].shift(1) == 'Country'))
# hstack your two masks and create a frame
new_df = pd.DataFrame(np.hstack([df[mask].to_numpy(), df[mask2].to_numpy()]))
print(new_df)
0 1 2 3 4 5 6 7
0 four million Number 21 24 Berlin Capital 6 9

How can I merge these two datasets on 'Name' and 'Year'?

I am new in this field and stuck on this problem. I have two datasets
all_batsman_df, this df has 5 columns('years','team','pos','name','salary')
years team pos name salary
0 1991 SF 1B Will Clark 3750000.0
1 1991 NYY 1B Don Mattingly 3420000.0
2 1991 BAL 1B Glenn Davis 3275000.0
3 1991 MIL DH Paul Molitor 3233333.0
4 1991 TOR 3B Kelly Gruber 3033333.0
all_batting_statistics_df, this df has 31 columns
Year Rk Name Age Tm Lg G PA AB R ... SLG OPS OPS+ TB GDP HBP SH SF IBB Pos Summary
0 1988 1 Glen Davis 22 SDP NL 37 89 83 6 ... 0.289 0.514 48.0 24 1 1 0 1 1 987
1 1988 2 Jim Acker 29 ATL NL 21 6 5 0 ... 0.400 0.900 158.0 2 0 0 0 0 0 1
2 1988 3 Jim Adduci* 28 MIL AL 44 97 94 8 ... 0.383 0.641 77.0 36 1 0 0 3 0 7D/93
3 1988 4 Juan Agosto* 30 HOU NL 75 6 5 0 ... 0.000 0.000 -100.0 0 0 0 1 0 0 1
4 1988 5 Luis Aguayo 29 TOT MLB 99 260 237 21 ... 0.354 0.663 88.0 84 6 1 1 1 3 564
I want to merge these two datasets on 'year', 'name'. But the problem is, these both data frames has different names like in the first dataset, it has name 'Glenn Davis' but in second dataset it has 'Glen Davis'.
Now, I want to know that How can I merge both of them using difflib library even it has different names?
Any help will be appreciated ...
Thanks in advance.
I have used this code which I got in a question asked at this platform but it is not working for me. I am adding a new column after matching names in both of the datasets. I know this is not a good approach. Kindly suggest, If i can do it in a better way.
df_a = all_batting_statistics_df
df_b = all_batters
df_a = df_a.astype(str)
df_b = df_b.astype(str)
df_a['merge_year'] = df_a['Year'] # we will use these as the merge keys
df_a['merge_name'] = df_a['Name']
for comp_a, addr_a in df_a[['Year','Name']].values:
for ixb, (comp_b, addr_b) in enumerate(df_b[['years','name']].values):
if cdifflib.CSequenceMatcher(None,comp_a,comp_b).ratio() > .6:
df_b.loc[ixb,'merge_year'] = comp_a # creates a merge key in df_b
if cdifflib.CSequenceMatcher(None,addr_a, addr_b).ratio() > .6:
df_b.loc[ixb,'merge_name'] = addr_a # creates a merge key in df_b
merged_df = pd.merge(df_a,df_b,on=['merge_name','merge_years'],how='inner')

You can do
import difflib
df_b['name'] = df_b['name'].apply(lambda x: \
difflib.get_close_matches(x, df_a['name'])[0])
to replace names in df_b with closest match from df_a, then do your merge. See also this post.

Let me get to your problem by assuming that you have to make a data set with 2 columns and the 2 columns being 1. 'year' and 2. 'name'
okay
1. we will 1st rename all the names which are wrong
I hope you know all the wrong names from all_batting_statistics_df using this
all_batting_statistics_df.replace(regex=r'^Glen.$', value='Glenn Davis')
once you have corrected all the spellings, choose the smaller one which has the names you know, so it doesn't take long
2. we need both data sets to have the same columns i.e. only 'year' and 'name'
use this to drop the columns we don't need
all_batsman_df_1 = all_batsman_df.drop(['team','pos','salary'])
all_batting_statistics_df_1 = all_batting_statistics_df.drop(['Rk','Name','Age','Tm','Lg','G','PA','AB','R','Summary'], axis=1)
I cannot see all the 31 columns so I left them, you have to add to the above code
3. we need to change the column names to look the same i.e. 'year' and 'name' using python dataframe rename
df_new_1 = all_batting_statistics_df(colums={'Year': 'year', 'Name':'name'})
4. next, to merge them
we will use this
all_batsman_df.merge(df_new_1, left_on='year', right_on='name')
FINAL THOUGHTS:
If you don't want to do all this find a way to export the data set to google sheets or microsoft excel and use edit them with those advanced software, if you like pandas then its not that difficult you will find a way, all the best!

How to delete rows with less than a certain amount of items or strings with Pandas?

I have searched a lot but couldn't find a solution to this particular case. I want to remove any rows that contains less than 3 strings or items in the lists. My issues will be addressed more clearly further down.
I'm preparing a LDA topic modelling with a large Swedish database in pandas and have limited the test case to 1000 rows. I'm only concerned with a specific column and my approach so far has been as follows:
con = sqlite3.connect('/Users/mo/EXP/NAV/afm.db')
sql = """
select * from stillinger limit 1000
"""
dfs = pd.read_sql(sql, con)
plb = """
select PLATSBESKRIVNING from stillinger limit 1000
"""
dfp = pd.read_sql(plb, con);dfp
Then I've defined a regular expression where the first argument removes any meta characters while keeping the Swedish and Norwegian language specific letters. The second argument removes words < 3:
rep = {
'PLATSBESKRIVNING': {
r'[^A-Za-zÅåÄäÖöÆØÅæøå]+': ' ',
r'\W*\b\w{1,3}\b': ' '}
}
p0 = (pd.DataFrame(dfp['PLATSBESKRIVNING'].str.lower()).replace(rep, regex=True).
drop_duplicates('PLATSBESKRIVNING').reset_index(drop=True));p0
PLATSBESKRIVNING
0 medrek rekrytering söker uppdrag manpower h...
1 familj barn tjejer kille söker pair ...
2 uppgift blir tillsammans medarbetare leda ...
3 behov operasjonssykepleiere langtidsoppdr...
4 detta perfekta jobbet arbetstiderna vardaga...
5 familj paris barn söker älskar barn v...
6 alla inom cafe restaurang förekommande arbets...
.
.
Creating a pandas Series:
s0 = p0['PLATSBESKRIVNING']
Then:
ts = s0.str.lower().str.split();ts
0 [medrek, rekrytering, söker, uppdrag, manpower...
1 [familj, barn, tjejer, kille, söker, pair, vil...
2 [uppgift, blir, tillsammans, medarbetare, leda...
3 [behov, operasjonssykepleiere, langtidsoppdrag...
4 [detta, perfekta, jobbet, arbetstiderna, varda...
5 [familj, paris, barn, söker, älskar, barn, vil...
6 [alla, inom, cafe, restaurang, förekommande, a...
7 [diskare, till, cafe, dubbel, sökes, arbetet, ...
8 [diskare, till, thelins, konditori, sökes, arb...
Removing the stop words from the database:
r = s0.str.split().apply(lambda x: [item for item in x if item not in mswl]);r
0 [uppdrag, bemanningsföretag, erbjuds, tillägg,...
1 [föräldrarna, citycentre, stort, tomt, mamman,...
2 [utveckla, övergripande, strategiska, frågor, ...
3 [erfaring, sykepleier, legitimasjon]
4 [arbetstiderna, vardagar, härliga, människor, ...
5 [paris, utav, badrum, båda, yngsta, endast, fö...
6 [förekommande, emot, utbildning]
7 []
8 [thelins]
9 [paris, baby, månader, våning, delar, badrum, ...
Creating a new DataFrame and removing the empty brackets:
dr = pd.DataFrame(r)
dr0 = dr[dr.astype(str)['PLATSBESKRIVNING'] != '[]'].reset_index(drop=True); dr0
PLATSBESKRIVNING
0 [uppdrag, bemanningsföretag, erbjuds, tillägg,...
1 [föräldrarna, citycentre, stort, tomt, mamman,...
2 [utveckla, övergripande, strategiska, frågor, ...
3 [erfaring, sykepleier, legitimasjon]
4 [arbetstiderna, vardagar, härliga, människor, ...
5 [paris, utav, badrum, båda, yngsta, endast, fö...
6 [förekommande, emot, utbildning]
7 [thelins]
8 [paris, baby, månader, våning, delar, badrum, ...
Maintaining the string:
dr1 = dr0['PLATSBESKRIVNING'].apply(str); len(dr1),type(dr1), dr1
0 ['uppdrag', 'bemanningsföretag', 'erbjuds', 't...
1 ['föräldrarna', 'citycentre', 'stort', 'tomt',...
2 ['utveckla', 'övergripande', 'strategiska', 'f...
3 ['erfaring', 'sykepleier', 'legitimasjon']
4 ['arbetstiderna', 'vardagar', 'härliga', 'männ...
5 ['paris', 'utav', 'badrum', 'båda', 'yngsta', ...
6 ['förekommande', 'emot', 'utbildning']
7 ['thelins']
8 ['paris', 'baby', 'månader', 'våning', 'delar'...
My issue now is that I want to remove any rows that contains less than 3 strings in the lists, e.g row 3, 6 and 7. Desired result would be like this:
0 ['uppdrag', 'bemanningsföretag', 'erbjuds', 't...
1 ['föräldrarna', 'citycentre', 'stort', 'tomt',...
2 ['utveckla', 'övergripande', 'strategiska', 'f...
3 ['arbetstiderna', 'vardagar', 'härliga', 'männ...
4 ['paris', 'utav', 'badrum', 'båda', 'yngsta', ...
5 ['paris', 'baby', 'månader', 'våning', 'delar'...
.
.
How can I obtain this? I'm also wondering if this could be done in a neater way? My approach seems so clumsy and cumbersome.
I would also like to remove both indexes and column name for the LDA topic modelling such that I could write it to a text file without the header and the digits of indexes. I have tried:
dr1.to_csv('LDA1.txt',header=None,index=False)
But this wraps quotation marks "['word1', 'word2', 't.. ]" to the each list of strings in the file.
Any suggestions would be much appreciated.
Best regards
Mo

Just measure the number of items in the list and filter the rows with length lower than 3
dr0['length'] = dr0['PLATSBESKRIVNING'].apply(lambda x: len(x))
cond = dr0['length'] > 3
dr0 = dr0[cond]

You can use apply len and then select data store it in the dataframe variable you like i.e
df[df['PLATSBESKRIVNING'].apply(len)>3]
Output :
PLATSBESKRIVNING
0 [uppdrag, bemanningsföretag, erbjuds, nice]
1 [föräldrarna, citycentre, stort, tomt]
2 [utveckla, övergripande, strategiska, fince]
4 [arbetstiderna, vardagar, härliga, männ]
5 [paris, utav, badrum, båda, yngsta]
8 [paris, baby, månader, våning, delar]

How to find the common values in a particular column in a particular table and display the intersected output

Table
Roll Class Country Rights CountryAcc
1 x IND 23 US
1 x1 IND 32 Ind
2 s US 12 US
3 q IRL 33 CA
4 a PAK 12 PAK
4 e PAK 12 IND
5 f US 21 CA
5 g US 31 PAK
6 h US 21 BAN
I want to display only those Rolls whose CountryAcc is not in US or CA. For example: if Roll 1 has one CountryAcc in US then I don't want its other row with CountryAcc Ind and same goes with Roll 5 as it is having one row with CountryAcc as CA. So my final output would be:
Roll Class Country Rights CountryAcc
4 a PAK 12 PAK
4 e PAK 12 IND
6 h US 21 BAN
I tried getting that output following way:
Home_Country = ['US', 'CA']
#First I saved two countries in a variable
Account_Other_Count = df.loc[~df.CountryAcc.isin(Home_Country)]
Account_Other_Count_Var = df.loc[~df.CountryAcc.isin(Home_Country)][['Roll']].values.ravel()
# Then I made two variables one with CountryAcc in US or CA and other variable with remaining and I got their Roll
Account_Home_Count = df.loc[df.CountryAcc.isin(Home_Country)]
Account_Home_Count_Var = df.loc[df.CountryAcc.isin(Home_Country)][['Roll']].values.ravel()
#Here I got the common Rolls
Common_ROLL = list(set(Account_Home_Count_Var).intersection(list(Account_Other_Count_Var)))
Final_Output = Account_Other_Count.loc[~Account_Other_Count.Roll.isin(Common_ROLL)]
Is there any better and more pandas or pythonic way to do it.

One solution could be
In [37]: df.ix[~df['Roll'].isin(df.ix[df['CountryAcc'].isin(['US', 'CA']), 'Roll'])]
Out[37]:
Roll Class Country Rights CountryAcc
4 4 a PAK 12 PAK
5 4 e PAK 12 IND
8 6 h US 21 BAN

This is one way to do it:
sortdata = df[~df['CountryAcc'].isin(['US', 'CA'])].sort(axis=0)

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Join 2 dataframes and create Parent Child Relationship? - python

Related

Pandas filter using multiple conditions and ignore entries that contain duplicates of substring

Getting a value out of pandas dataframe based on a set of conditions

How can I merge these two datasets on 'Name' and 'Year'?

How to delete rows with less than a certain amount of items or strings with Pandas?

How to find the common values in a particular column in a particular table and display the intersected output

Categories

Resources