how to drop similar values in pandas using levenshtein function

how to drop similar values in pandas using levenshtein function - python

I have a dataframe which looks like -
ML_ENTITY_NAME EDT_ENTITY_NAME
1 ABC BANK HABIB METROPOLITAN BANK
2 ABC BANK HABIB METROPOLITIAN BANK
3 BANK OF AMERICA HSBC BANK MALAYSIA BHD
4 BANK OF AMERICA HSBC BANK MALAYSIA SDN BHD
5 BANK OF NEW ZEALAND HUA NAN COMMERCIAL BANK
6 BANK OF NEW ZEALAND HUA NAN COMMERCIAL BANK LTD
7 CITIBANK N.A. CHINA GUANGFA BANK CO LTD
8 CITIBANK N.A. CHINA GUANGFA BANK CO.,LTD
9 SECURITY BANK CORP. SECURITY BANK CORP
10 SIAM COMMERCIAL BANK THE SIAM COMMERCIAL BANK PCL
11 TEMU ANZ BANK SAMOA LTD
I have written a levenshtein function which loooks like -
def fm(s1, s2):
score = Levenshtein.distance(s1,s2)
if score == 0.0:
score = 1.0
else:
score = 1 - (score / len(s1))
return score
I wanted to write a code that if the levenstein score of two EDT_ENTITY_NAME values is greater than .75 then we drop the one value having less length and retain the one having more length.Also the ML_ENTITY_NAME for comparison should be same.
My final output should looks like -
ML_ENTITY_NAME EDT_ENTITY_NAME
1 ABC BANK HABIB METROPOLITIAN BANK
2 BANK OF AMERICA HSBC BANK MALAYSIA SDN BHD
3 BANK OF NEW ZEALAND HUA NAN COMMERCIAL BANK LTD
4 CITIBANK N.A. CHINA GUANGFA BANK CO.,LTD
5 SECURITY BANK CORP. SECURITY BANK CORP
6 SIAM COMMERCIAL BANK THE SIAM COMMERCIAL BANK PCL
7 TEMU ANZ BANK SAMOA LTD
Currently my approach is to sort the df and iterate over the loop and check if ML_ENTITY_NAME values are same then calculate the levenshtein for EDT_ENTITY_NAME. i have added a new column delete and I'm updating the delete column to 1 if the above conditions satifies and the length one ML_ENTITY_NAME is smaller than other ML_ENTITY_NAME.
my code looks like -
df.sort_values(by=['ML_ENTITY_NAME','EDT_ENTITY_NAME'],inplace=True)
df['delete']=0
for row1 in df.itertuples():
for row2 in df.itertuples():
if (str(row1.ML_ENTITY_NAME) == str(row2.ML_ENTITY_NAME)) and (1>fm(str(row1.EDT_ENTITY_NAME),str(row2.EDT_ENTITY_NAME))>.74):
if(len(row1.EDT_ENTITY_NAME)>len(row2.EDT_ENTITY_NAME)):
df.loc[row2.Index,row2[2]]=1
print(df)
currently it's giving wrong output.
can someone help me with some answers/hints/suggestions?

I believe you need:
#cross join by ML_ENTITY_NAME column
df1 = df.merge(df, on='ML_ENTITY_NAME', how='outer')
#remove same values per rows (distance 1)
df1 = df1[df1['EDT_ENTITY_NAME_x'] != df1['EDT_ENTITY_NAME_y']]
#apply function and compare
m1 = df1.apply(lambda x: fm(x['EDT_ENTITY_NAME_x'], x['EDT_ENTITY_NAME_y']), axis=1) > .75
m2 = df1['EDT_ENTITY_NAME_x'].str.len() > df1['EDT_ENTITY_NAME_y'].str.len()
#filtering
df2 = df1.loc[m1 & m2, ['ML_ENTITY_NAME','EDT_ENTITY_NAME_x']]
#remove `_x`
df2.columns = df2.columns.str.replace('_x$', '')
#add unique rows per ML_ENTITY_NAME
df2 = df2.append(df[~df['ML_ENTITY_NAME'].duplicated(keep=False)]).reset_index(drop=True)
print (df2)
ML_ENTITY_NAME EDT_ENTITY_NAME
0 ABC BANK HABIB METROPOLITIAN BANK
1 BANK OF AMERICA HSBC BANK MALAYSIA SDN BHD
2 BANK OF NEW ZEALAND HUA NAN COMMERCIAL BANK LTD
3 CITIBANK N.A. CHINA GUANGFA BANK CO.,LTD
4 SECURITY BANK CORP. SECURITY BANK CORP
5 SIAM COMMERCIAL BANK THE SIAM COMMERCIAL BANK PCL
6 TEMU ANZ BANK SAMOA LTD

Could you specify what exactly is wrong about the output you are getting? The only deviation from your goal I see in code is that you only set the delete flag to 1 for row pairs with 0.74 < fm(...) < 1, while it should be rather 0.75 < fm(...).
As a side note, sorting is redundant in your code, since you end up comparing every possible pair of rows anyways. What you possibly had in mind when implementing the sorting was going through each consecutive pair of rows, which would improve the complexity of your code from O(n2) to O(n).
Another side note is that you don't need the if statement in your fm function: statement score = 1 - score / len(s1) would cover both cases.

Related

Create multiple new pandas column based on other columns in a loop

Assuming I have the following toy dataframe, df:
Country Population Region HDI
China 100 Asia High
Canada 15 NAmerica V.High
Mexico 25 NAmerica Medium
Ethiopia 30 Africa Low
I would like to create new columns based on the population, region, and HDI of Ethiopia in a loop. I tried the following method, but it is time-consuming when a lot of columns are involved.
df['Population_2'] = df['Population'][df['Country'] == "Ethiopia"]
df['Region_2'] = df['Region'][df['Country'] == "Ethiopia"]
df['Population_2'].fillna(method='ffill')
My final DataFrame df should look like:
Country Population Region HDI Population_2 Region_2 HDI_2
China 100 Asia High 30 Africa Low
Canada 15 NAmerica V.High 30 Africa Low
Mexico 25 NAmerica Medium 30 Africa Low
Ethiopia 30 Africa Low 30 Africa Low

How about this?
for col in ['Population', 'Region', 'HDI']:
df[col + '_2'] = df.loc[df.Country=='Ethiopia', col].iat[0]
I don't quite understand the broader point of what you're trying to do, and if Ethiopia could have multiple values the solution might be different. But this works for the problem as you presented it.

You can use:
# select Ethiopia row and add suffix "_2" to the columns (except Country)
s = (df.drop(columns='Country')
.loc[df['Country'].eq('Ethiopia')].add_suffix('_2').squeeze()
)
# broadcast as new columns
df[s.index] = s
output:
Country Population Region HDI Population_2 Region_2 HDI_2
0 China 100 Asia High 30 Africa Low
1 Canada 15 NAmerica V.High 30 Africa Low
2 Mexico 25 NAmerica Medium 30 Africa Low
3 Ethiopia 30 Africa Low 30 Africa Low

You can use assign and also assuming that you have only row corresponding to Ethiopia:
d = dict(zip(df.columns.drop('Country').map('{}_2'.format),
df.set_index('Country').loc['Ethiopia']))
df = df.assign(**d)
print(df):
Country Population Region HDI Population_2 Region_2 HDI_2
0 China 100 Asia High 30 Africa Low
1 Canada 15 NAmerica V.High 30 Africa Low
2 Mexico 25 NAmerica Medium 30 Africa Low
3 Ethiopia 30 Africa Low 30 Africa Low

Need help in matching strings from phrases from multiple columns of a dataframe in python

Need help in matching phrases in the data given below where I need to match phrases from both TextA and TextB.
The following code did not helped me in doing it how can I address this I had 100s of them to match
#sorting jumbled phrases
def sorts(string_value):
sorted_string = sorted(string_value.split())
sorted_string = ' '.join(sorted_string)
return sorted_string
#Removing punctuations in string
punc = '''!()-[]{};:'"\,<>./?##$%^&*_~'''
def punt(test_str):
for ele in test_str:
if ele in punc:
test_str = test_str.replace(ele, "")
return(test_str)
#matching strings
def lets_match(x):
for text1 in TextA:
for text2 in TextB:
try:
if sorts(punt(x[text1.casefold()])) == sorts(punt(x[text2.casefold()])):
return True
except:
continue
return False
df['result'] = df.apply(lets_match,axis =1)
even after implementing string sort, removing punctuations and case sensitivity I am still getting those strings as not matching. I am I missing something here can some help me in achieving it

Actually you can use difflib to match two text, here's what you can try:
from difflib import SequenceMatcher
def similar(a, b):
a=str(a).lower()
b=str(b).lower()
return SequenceMatcher(None, a, b).ratio()
def lets_match(d):
print(d[0]," --- ",d[1])
result=similar(d[0],d[1])
print(result)
if result>0.6:
return True
else:
return False
df["result"]=df.apply(lets_match,axis =1)
You can play with if result>0.6 value.
For more information about difflib you can visit here. There are other sequence matchers also like textdistance but I found it easy so I tried this.

Is there any issues with using the fuzzy match lib? The implementation is pretty straight forward and works well given the above data is relatively similar. I've performed the below without preprocessing.
import pandas as pd
""" Install the libs below via terminal:
$pip install fuzzywuzzy
$pip install python-Levenshtein
"""
from fuzzywuzzy import fuzz
from fuzzywuzzy import process
#creating the data frames
text_a = ['AKIL KUMAR SINGH','OUSMANI DJIBO','PETER HRYB','CNOC LIMITED','POLY NOVA INDUSTRIES LTD','SAM GAWED JR','ADAN GENERAL LLC','CHINA MOBLE LIMITED','CASTAR CO., LTD.','MURAN','OLD SAROOP FOR CAR SEAT COVERS','CNP HEALTHCARE, LLC','GLORY PACK LTD','AUNCO VENTURES','INTERNATIONAL COMPANY','SAMEERA HEAT AND ENERGY FUND']
text_b = ['Singh, Akil Kumar','DJIBO, Ousmani Illiassou','HRYB, Peter','CNOOC LIMITED','POLYNOVA INDUSTRIES LTD.','GAWED, SAM','ADAN GENERAL TRADING FZE','CHINA MOBILE LIMITED','CASTAR GROUP CO., LTD.','MURMAN','Old Saroop for Car Seat Covers','CNP HEATHCARE, LLC','GLORY PACK LTD.','AUNCO VENTURE','INTL COMPANY','SAMEERA HEAT AND ENERGY PROPERTY FUND']
df_text_a = pd.DataFrame(text_a, columns=['text_a'])
df_text_b = pd.DataFrame(text_b, columns=['text_b'])
def lets_match(txt: str, chklist: list) -> str:
return process.extractOne(txt, chklist, scorer=fuzz.token_set_ratio)
#match Text_A against Text_B
result_txt_ab = df_text_a.apply(lambda x: lets_match(str(x), text_b), axis=1, result_type='expand')
result_txt_ab.rename(columns={0:'Return Match', 1:'Match Value'}, inplace=True)
df_text_a[result_txt_ab.columns]=result_txt_ab
df_text_a
text_a Return Match Match Value
0 AKIL KUMAR SINGH Singh, Akil Kumar 100
1 OUSMANI DJIBO DJIBO, Ousmani Illiassou 72
2 PETER HRYB HRYB, Peter 100
3 CNOC LIMITED CNOOC LIMITED 70
4 POLY NOVA INDUSTRIES LTD POLYNOVA INDUSTRIES LTD. 76
5 SAM GAWED JR GAWED, SAM 100
6 ADAN GENERAL LLC ADAN GENERAL TRADING FZE 67
7 CHINA MOBLE LIMITED CHINA MOBILE LIMITED 79
8 CASTAR CO., LTD. CASTAR GROUP CO., LTD. 81
9 MURAN SAMEERA HEAT AND ENERGY PROPERTY FUND 41
10 OLD SAROOP FOR CAR SEAT COVERS Old Saroop for Car Seat Covers 100
11 CNP HEALTHCARE, LLC CNP HEATHCARE, LLC 58
12 GLORY PACK LTD GLORY PACK LTD. 100
13 AUNCO VENTURES AUNCO VENTURE 56
14 INTERNATIONAL COMPANY INTL COMPANY 74
15 SAMEERA HEAT AND ENERGY FUND SAMEERA HEAT AND ENERGY PROPERTY FUND 86
#match Text_B against Text_A
result_txt_ba= df_text_b.apply(lambda x: lets_match(str(x), text_a), axis=1, result_type='expand')
result_txt_ba.rename(columns={0:'Return Match', 1:'Match Value'}, inplace=True)
df_text_b[result_txt_ba.columns]=result_txt_ba
df_text_b
text_b Return Match Match Value
0 Singh, Akil Kumar AKIL KUMAR SINGH 100
1 DJIBO, Ousmani Illiassou OUSMANI DJIBO 100
2 HRYB, Peter PETER HRYB 100
3 CNOOC LIMITED CNOC LIMITED 74
4 POLYNOVA INDUSTRIES LTD. POLY NOVA INDUSTRIES LTD 74
5 GAWED, SAM SAM GAWED JR 86
6 ADAN GENERAL TRADING FZE ADAN GENERAL LLC 86
7 CHINA MOBILE LIMITED CHINA MOBLE LIMITED 81
8 CASTAR GROUP CO., LTD. CASTAR CO., LTD. 100
9 MURMAN ADAN GENERAL LLC 33
10 Old Saroop for Car Seat Covers OLD SAROOP FOR CAR SEAT COVERS 100
11 CNP HEATHCARE, LLC CNP HEALTHCARE, LLC 56
12 GLORY PACK LTD. GLORY PACK LTD 100
13 AUNCO VENTURE AUNCO VENTURES 53
14 INTL COMPANY INTERNATIONAL COMPANY 50
15 SAMEERA HEAT AND ENERGY PROPERTY FUND SAMEERA HEAT AND ENERGY FUND 100

I think you can't do it without a strings distance notion, what you can do is use, for example record linkage.
I will not get into details, but i'll show you an example of usage on this case.
import pandas as pd
import recordlinkage as rl
from recordlinkage.preprocessing import clean
# creating first dataframe
df_text_a = pd.DataFrame({
"Text A":[
"AKIL KUMAR SINGH",
"OUSMANI DJIBO",
"PETER HRYB",
"CNOC LIMITED",
"POLY NOVA INDUSTRIES LTD",
"SAM GAWED JR",
"ADAN GENERAL LLC",
"CHINA MOBLE LIMITED",
"CASTAR CO., LTD.",
"MURAN",
"OLD SAROOP FOR CAR SEAT COVERS",
"CNP HEALTHCARE, LLC",
"GLORY PACK LTD",
"AUNCO VENTURES",
"INTERNATIONAL COMPANY",
"SAMEERA HEAT AND ENERGY FUND"]
}
)
# creating second dataframe
df_text_b = pd.DataFrame({
"Text B":[
"Singh, Akil Kumar",
"DJIBO, Ousmani Illiassou",
"HRYB, Peter",
"CNOOC LIMITED",
"POLYNOVA INDUSTRIES LTD. ",
"GAWED, SAM",
"ADAN GENERAL TRADING FZE",
"CHINA MOBILE LIMITED",
"CASTAR GROUP CO., LTD.",
"MURMAN ",
"Old Saroop for Car Seat Covers",
"CNP HEATHCARE, LLC",
"GLORY PACK LTD.",
"AUNCO VENTURE",
"INTL COMPANY",
"SAMEERA HEAT AND ENERGY PROPERTY FUND"
]
}
)
# preprocessing in very important on results, you have to find which fit well on yuor problem.
cleaned_a = pd.DataFrame(clean(df_text_a["Text A"], lowercase=True))
cleaned_b = pd.DataFrame(clean(df_text_b["Text B"], lowercase=True))
# creating an indexing which will be used for comprison, you have various type of indexing, watch documentation.
indexer = rl.Index()
indexer.full()
# generating all passible pairs
pairs = indexer.index(cleaned_a, cleaned_b)
# starting evaluation phase
compare = rl.Compare(n_jobs=-1)
compare.string("Text A", "Text B", method='jarowinkler', label = 'text')
matches = compare.compute(pairs, cleaned_a, cleaned_b)
matches is now a MultiIndex DataFrame, what you want to do next is to find all max on the second index by first index. So you will have the results you need.
Results can be improved working on distance, indexing and/or preprocessing.

How to combine common rows in DataFrame

I'm running some analysis on bank statements (csv's). Some items like McDonalds each have their own row (due to having different addresses).
I'm trying to combine these rows by a common phrase. So for this example the obvious phrase, or string, would be "McDonalds". I think it'll be an if statement.
Also, the column has a dtype of "object". Will I have to convert it to string format?
Here is an example output of the result of printingtotali = df.Item.value_counts() from my code.
Ideally I'd want that line to output McDonalds as just a single row.
In the csv they are 2 separate rows.
foo 14
Restaurant Boulder CO 8
McDonalds Boulder CO 5
McDonalds Denver CO 5
Here's what the column data consists of
'Sukiya Greenwood Vil CO' 'Sei 34179 Denver CO' 'Chambers Place Liquors 303-3731100 CO' "Mcdonald's F26593 Fort Collins CO" 'Suh Sushi Korean Bbq Fort Collins CO' 'Conoco - Sei 26927 Fort Collins CO'

OK. I think I ginned up something that can be helpful. Realize that the task of inferring categories or names from text strings can be huge, depending on how detailed you want to get. You can dive into regex or other learning models. People make careers of it! Obviously, your bank is doing some of this as they categorize things when you get a year-end summary.
Anyhow, here is a simple way to generate some categories and use them as a basis for the grouping that you want to do.
import pandas as pd
item=['McDonalds Denver', 'Sonoco', 'ATM Fee', 'Sonoco, Ft. Collins', 'McDonalds, Boulder', 'Arco Boulder']
txn = [12.44, 4.00, 3.00, 14.99, 19.10, 52.99]
df = pd.DataFrame([item, txn]).T
df.columns = ['item_orig', 'charge']
print(df)
# let's add an extra column to catch the conversions...
df['item'] = pd.Series(dtype=str)
# we'll use the "contains" function in pandas as a simple converter... quick demo
temp = df.loc[df['item_orig'].str.contains('McDonalds')]
print('\nitems that containt the string "McDonalds"')
print(temp)
# let's build a simple conversion table in a dictionary
conversions = { 'McDonalds': 'McDonalds - any',
'Sonoco': 'gas',
'Arco': 'gas'}
# let's loop over the orig items and put conversions into the new column
# (there is probably a faster way to do this, but for data with < 100K rows, who cares.)
for key in conversions:
df['item'].loc[df['item_orig'].str.contains(key)] = conversions[key]
# see how we did...
print('converted...')
print(df)
# now move over anything that was NOT converted
# in this example, this is just the ATM Fee item...
df['item'].loc[df['item'].isnull()] = df['item_orig']
# now we have decent labels to support grouping!
print('\n\n *** sum of charges by group ***')
print(df.groupby('item')['charge'].sum())
Yields:
item_orig charge
0 McDonalds Denver 12.44
1 Sonoco 4
2 ATM Fee 3
3 Sonoco, Ft. Collins 14.99
4 McDonalds, Boulder 19.1
5 Arco Boulder 52.99
items that containt the string "McDonalds"
item_orig charge item
0 McDonalds Denver 12.44 NaN
4 McDonalds, Boulder 19.1 NaN
converted...
item_orig charge item
0 McDonalds Denver 12.44 McDonalds - any
1 Sonoco 4 gas
2 ATM Fee 3 NaN
3 Sonoco, Ft. Collins 14.99 gas
4 McDonalds, Boulder 19.1 McDonalds - any
5 Arco Boulder 52.99 gas
*** sum of charges by group ***
item
ATM Fee 3.00
McDonalds - any 31.54
gas 71.98
Name: charge, dtype: float64

Filter and drop rows by proportion python

I have a dataframe called wine that contains a bunch of rows I need to drop.
How do i drop all rows in column 'country' that are less than 1% of the whole?
Here are the proportions:
#proportion of wine countries in the data set
wine.country.value_counts() / len(wine.country)
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233
New Zealand 0.009069
Israel 0.006133
Greece 0.004493
Canada 0.002526
Hungary 0.001755
Romania 0.001558
...
I got lazy and didn't include all of the results, but i think you catch my drift. I need to drop all rows with proportions less than .01
Here is the head of my dataframe:
country designation points price province taster_name variety year price_category
Portugal Avidagos 87 15.0 Douro Roger Voss Portuguese Red 2011.0 low

You can use something like this:
df = df[df.proportion >= .01]
From that dataset it should give you something like this:
US 0.382384
France 0.153514
Italy 0.100118
Spain 0.070780
Portugal 0.062186
Chile 0.056742
Argentina 0.042835
Austria 0.034767
Germany 0.028928
Australia 0.021434
South Africa 0.010233

figured it out
country_filter = wine.country.value_counts(normalize=True) > 0.01
country_index = country_filter[country_filter.values == True].index
wine = wine[wine.country.isin(list(country_index))]

Having trouble merging two dataframes in python

I am new to Python and I am trying to merge two datasets for my research together:
df1 has the column names: companyname, ticker, and Dscode,
df2 has companyname, ticker, grouptcode, and Dscode.
I want to merge the grouptcode from df1 to df2, however, the companyname is slightly different, but very similar between the two dataframes.
For each ticker, there is an associated Dscode. However, multiple companies have the same ticker, and therefore the same Dscode.
Problem
I am only interested in merging the grouptcode for the associated ticker and Dscode that matches the companyname (which at times is slightly different - this part is what I cannot get past). The code I have been using is below.
Code
import pandas as pd
import os
# set working directory
path = "/Users/name/Desktop/Python"
os.chdir(path)
os.getcwd() # Prints the working directory
# read in excel file
file = "/Users/name/Desktop/Python/Excel/DSROE.xlsx"
x1 = pd.ExcelFile(file)
print(x1.sheet_names)
df1 = x1.parse('Sheet1')
df1.head()
df1.tail()
file2 = "/Users/name/Desktop/Python/Excel/tcode2.xlsx"
x2 = pd.ExcelFile(file2)
print(x2.sheet_names)
df2 = x2.parse('Sheet1')
df2['companyname'] = df2['companyname'].str.upper() ## make column uppercase
df2.head()
df2.tail()
df2 = df2.dropna()
x3 = pd.merge(df1, df2,how = 'outer') # merge
Data
df1
Dscode ticker companyname
65286 8933TC 3pl 3P LEARNING LIMITED
79291 9401FP a2m A2 MILK COMPANY LIMITED
1925 14424Q aac AUSTRALIAN AGRICULTURAL COMPANY LIMITED
39902 675493 aad ARDENT LEISURE GROUP
1400 133915 aba AUSWIDE BANK LIMITED
74565 922472 abc ADELAIDE BRIGHTON LIMITED
7350 26502C abp ABACUS PROPERTY GROUP
39202 675142 ada ADACEL TECHNOLOGIES LIMITED
80866 9661AD adh ADAIRS
80341 9522QV afg AUSTRALIAN FINANCE GROUP LIMITED
45327 691938 agg ANGLOGOLD ASHANTI LIMITED
2625 14880E agi AINSWORTH GAME TECHNOLOGY LIMITED
75090 923040 agl AGL ENERGY LIMITED
19251 29897X ago ATLAS IRON LIMITED
64409 890588 agy ARGOSY MINERALS LIMITED
24151 31511D ahg AUTOMOTIVE HOLDINGS GROUP LIMITED
64934 8917JD ahy ASALEO CARE LIMITED
42877 691152 aia AUCKLAND INTERNATIONAL AIRPORT LIMITED
61433 88013C ajd ASIA PACIFIC DATA CENTRE GROUP
44452 691704 ajl AJ LUCAS GROUP LIMITED
700 13288C ajm ALTURA MINING LIMITED
19601 29929D akp AUDIO PIXELS HOLDINGS LIMITED
79816 951404 alk ALKANE RESOURCES LIMITED
56008 865613 all ARISTOCRAT LEISURE LIMITED
51807 771351 alq ALS LIMITED
44277 691685 alu ALTIUM LIMITED
42702 68625C alx ATLAS ARTERIA GROUP
30101 41162F ama AMA GROUP LIMITED
67386 902201 amc AMCOR LIMITED
33426 50431L ami AURELIA METALS LIMITED
df2
companyname grouptcode ticker
524 3P LEARNING LIMITED.. tpn1 3pl
1 THE A2 MILK COMPANY LIMITED a2m1 a2m
2 AUSTRALIAN AGRICULTURAL COMPANY LIMITED. aac2 aac
3 AAPC LIMITED. aad1 aad
6 ADVANCE BANK AUSTRALIA LIMITED aba1 aba
7 ADELAIDE BRIGHTON CEMENT HOLDINGS LIMITED abc1 abc
8 ABACUS PROPERTY GROUP abp1 abp
9 ADACEL TECHNOLOGIES LIMITED ada1 ada
288 ADA CORPORATION LIMITED khs1 ada
10 AERODATA HOLDINGS LIMITED adh1 adh
11 ADAMS (HERBERT) HOLDINGS LIMITED adh2 adh
12 ADAIRS LIMITED adh3 adh
431 ALLCO FINANCE GROUP LIMITED rcd1 afg
13 AUSTRALIAN FINANCE GROUP LTD afg1 afg
14 ANGLOGOLD ASHANTI LIMITED agg1 agg
15 APGAR INDUSTRIES LIMITED agi1 agi
16 AINSWORTH GAME TECHNOLOGY LIMITED agi2 agi
17 AUSTRALIAN GAS LIGHT COMPANY (THE) agl1 agl
18 ATLAS IRON LIMITED ago1 ago
393 ACM GOLD LIMITED pgo2 ago
19 AUSTRALIAN GYPSUM INDUSTRIES LIMITED agy1 agy
142 ARGOSY MINERALS INC cio1 agy
21 ARCHAEAN GOLD NL ahg1 ahg
22 AUSTRALIAN HYDROCARBONS N.L. ahy1 ahy
23 ASALEO CARE LIMITED ahy2 ahy
24 AUCKLAND INTERNATIONAL AIRPORT LIMITED aia1 aia
25 ASIA PACIFIC DATA CENTRE GROUP ajd1 ajd
26 AJ LUCAS GROUP LIMITED ajl1 ajl
27 AJAX MCPHERSON'S LIMITED ajm1 ajm
29 ALKANE EXPLORATION (TERRIGAL) N.L. alk1 alk
Dscode
524 8933TC
1 9401FP
2 14424Q
3 675493
6 133915
7 922472
8 26502C
9 675142
288 675142
10 9661AD
11 9661AD
12 9661AD
431 9522QV
13 9522QV
14 691938
15 14880E
16 14880E
17 923040
18 29897X
393 29897X
19 890588
142 890588
21 31511D
22 8917JD
23 8917JD
24 691152
25 88013C
26 691704
27 13288C
29 951404

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to drop similar values in pandas using levenshtein function - python

Related

Create multiple new pandas column based on other columns in a loop

Need help in matching strings from phrases from multiple columns of a dataframe in python

How to combine common rows in DataFrame

Filter and drop rows by proportion python

Having trouble merging two dataframes in python

Categories

Resources