How to avoid case sensitivity in Python - python

I have this code and want to compare two lists.
list2= [('Tom','100'),('Alex','200')]
list3= [('tom','100'),('alex','200')]
non_match = []
for line in list2:
if line not in list3:
non_match.append(line)
print(non_match)
The results will be:
[('Tom', '100'), ('Alex', '200')]
because of case sensitivity! is there any way to avoid the case sensitivity in this case? I don't want to change the lists to upper or lower case.
Or any other method which can match these lists?

Using lower to convert the tuple to lower case for comparison
list2= [('Tom','100'),('Alex','200')]
list3= [('tom','100'),('alex','200')]
non_match = []
for line in list2:
name, val = line
if (name.lower(), val) not in list3:
non_match.append(line)
print(non_match)

You can't avoid transforming your data to some case-insensitive format, at some point.
What you can do is to avoid recreating the full lists:
def make_canonical(line):
name, number = line
return (name.lower(), number)
non_match = []
for line2 in list2:
search = make_canonical(line2)
for line3 in list3:
canonical = make_canonical(line3)
if search == canonical:
break
else:
# Did not hit the break
non_match.append(line3)

You need to iterate tuples also inside loop
for line2 in list2:
for line3 in list3:
if len(line3) == len(line2):
lenth = len(line3)
successCount = 0
match = False
for i in range(lenth):
result = line2[i].lower() == line3[i].lower()
if result == True:
successCount = successCount +1;
result = False
if successCount == lenth:
non_match.append(line2)
print(non_match)
enjoy.....

You can make the comparison even more generic mixing ints and whitespaces in the game by creating two dicts from your tuple-lists and compare the lists:
def unify(v):
return str(v).lower().strip()
list2= [('Tom ','100'),(' AleX',200)]
list3= [('toM',100),('aLex ','200')]
d2 = {unify(k):unify(v) for k,v in list2} # create a dict
d3 = {unify(k):unify(v) for k,v in list3} # create another dict
print(d2 == d3) # dicts compare (key,value) wise
The applied methods will make strings from integers, strip whitespaces and then compare the dicts.
Output:
True

This worked for me! Both lists will be converted to lower case.
list2= [('Tom','100'),('Alex','200'),('Tom', '13285')]
list3= [('tom','100'),('ALex','200'),('Tom', '13285')]
def make_canonical(line):
name, number = line
return (name.lower(), number)
list22 = []
for line2 in list2:
search = make_canonical(line2)
list22.append(search)
list33 =[]
for line3 in list3:
search1 = make_canonical(line3)
list33.append(search1)
non_match = []
for line in list22:
if line not in list33:
non_match.append(line)
print(non_match)

Related

How can i create a dictionary from 2 lists with one as the key and the other as as the value with only loops? Without using zip() or enumerate()

I wanna achieve this without any libraries or special functions just loops. I wanna have a main program that takes in 2 inputs which are the 2 lists and returns the dictionary like shown below.
Please enter the item names: Cans, bottles, boxes, jugs
please enter quantities : 20,34,10
output : {'Cans':'20','bottles':'34','boxes':'10','jugs':'0'}
If the list of items is longer than the quantities then the quantity becomes automatically 0 as it did with the jugs above.
If the List of Quantity is longer than the items list then the item should automatically become 'unknown object_1' with the number changing accordingly.
Split with comma as delimiter. Fill values with zero for a number of iterations equal to the difference in length between keys and values.
Then use dict comprehension to build your dict. This with the zip built-in function.
keys = 'a,b,c,d'
values = '1,2,3'
keys = keys.split(',')
values = values.split(',')
for i in range(len(keys) - len(values)):
values.append('0')
dct = {}
for i in range(len(keys)):
dct[keys[i]] = values[i]
print(dct)
Output:
{'a': '1', 'b': '2', 'c': '3', 'd': '0'}
This uses only built-in calls so it fits your requirements at best. At the OP requirements it is not using the zip function.
item_names = ['Cans', 'Bottles', 'boxes', 'jugs']
quantities = [20, 34, 10]
output_dict = {}
for i, item in enumerate(item_names):
if i > len(quantities) - 1:
output_dict.update({item : 0})
else:
output_dict.update({item : quantities[i]})
a = list(input().split(','))
b = list(map(int, input().split(',')))
res = {}
for i in range(len(a)):
res[a[i]] = b[i] if i < len(b) else 0
print(res)
list1 = ['cans','Bottles','Boxes','Jugs']
list2 = [1,2,3]
res = {}
for i, element in enumerate(list1):
try:
res[element] = list2[i]
except IndexError:
res[element] = 0
print(res)
Edited code without enumerate or zip:
list1 = ['cans','Bottles','Boxes','Jugs']
list2 = [1,2,3]
res = {}
i=0
for element in list1:
try:
res[element] = list2[i]
except IndexError:
res[element] = 0
i+=1
print(res)
```

Find duplicates in a list of strings differing only in upper and lower case writing

I have a list of strings that contains 'literal duplicates' and 'pseudo-duplicates' which differ only in lower- and uppercase writing. I am looking for a function that treats all literal duplicates as one group, returns their indices, and finds all pseudo-duplicates for these elements, again returning their indices.
Here's an example list:
a = ['bar','bar','foo','Bar','Foo','Foo']
And this is the output I am looking for (a list of lists of lists):
dupe_list = [[[0,1],[3]],[[2],[4,5]]]
Explanation: 'bar' appears twice at the indexes 0 and 1 and there is one pseudo-duplicate 'Bar' at index 3. 'foo' appears once at index 2 and there are two pseudo-duplicates 'Foo' at indexes 4 and 5.
Here is one solution (you didn't clarify what the logic of list items will be and i considered that you want the items in lower format as they are met from left to right in the list, let me know if it must be different):
d={i:[[], []] for i in set(k.lower() for k in a)}
for i in range(len(a)):
if a[i] in d.keys():
d[a[i]][0].append(i)
else:
d[a[i].lower()][1].append(i)
result=list(d.values())
Output:
>>> print(result)
[[[0, 1], [3]], [[2], [4, 5]]]
Here's how I would achieve it. But you should consider using a dictionary and not a list of list of list. Dictionaries are excellent data structures for problems like this.
#default argument vars
a = ['bar','bar','foo','Bar','Foo','Foo']
#initalize a dictionary to count occurances
a_dict = {}
for i in a:
a_dict[i] = None
#loop through keys in dictionary, which is values from a_list
#loop through the items from list a
#if the item is exact match to key, add index to list of exacts
#if the item is similar match to key, add index to list of similars
#update the dictionary key's value
for k, v in a_dict.items():
index_exact = []
index_similar = []
for i in range(len(a)):
print(a[i])
print(a[i] == k)
if a[i] == str(k):
index_exact.append(i)
elif a[i].lower() == str(k):
index_similar.append(i)
a_dict[k] = [index_exact, index_similar]
#print out dictionary values to assure answer
print(a_dict.items())
#segregate values from dictionary to its own list.
dup_list = []
for v in a_dict.values():
dup_list.append(v)
print(dup_list)
Here is the solution. I have handled the situation where if there are only pseudo duplicates present or only literal duplicates present
a = ['bar', 'bar', 'foo', 'Bar', 'Foo', 'Foo', 'ka']
# Dictionaries to store the positions of words
literal_duplicates = dict()
pseudo_duplicates = dict()
for index, item in enumerate(a):
# Treates words as literal duplicates if word is in smaller case
if item.islower():
if item in literal_duplicates:
literal_duplicates[item].append(index)
else:
literal_duplicates[item] = [index]
# Handle if only literal_duplicates present
if item not in pseudo_duplicates:
pseudo_duplicates[item] = []
# Treates words as pseudo duplicates if word is in not in smaller case
else:
item_lower = item.lower()
if item_lower in pseudo_duplicates:
pseudo_duplicates[item_lower].append(index)
else:
pseudo_duplicates[item_lower] = [index]
# Handle if only pseudo_duplicates present
if item not in literal_duplicates:
literal_duplicates[item_lower] = []
# Form final list from the dictionaries
dupe_list = [[v, pseudo_duplicates[k]] for k, v in literal_duplicates.items()]
Here is the simple and easy to understand answer for you
a = ['bar','bar','foo','Bar','Foo','Foo']
dupe_list = []
ilist = []
ilist2 =[]
samecase = -1
dupecase = -1
for i in range(len(a)):
if a[i] != 'Null':
ilist = []
ilist2 = []
for j in range(i+1,len(a)):
samecase = -1
dupecase = -1
# print(a)
if i not in ilist:
ilist.append(i)
if a[i] == a[j]:
# print(a[i],a[j])
samecase = j
a[j] = 'Null'
elif a[i] == a[j].casefold():
# print(a[i],a[j])
dupecase = j
a[j] = 'Null'
# print(samecase)
# print(ilist,ilist2)
if samecase != -1:
ilist.append(samecase)
if dupecase != -1:
ilist2.append(dupecase)
dupe_list.append([ilist,ilist2])
a[i]='Null'
print(dupe_list)

How to write faster Python code?

My code
with open('data1.txt','r') as f:
lst = [int(line) for line in f]
l1=lst[::3]
l2=lst[1::3]
l3=lst[2::3]
print len(l1)
print len(l2)
print len(l3)
b = []
for i in range(3200000):
b.append(i+1)
print len(b)
mapping = dict(zip(l1, b))
matches = [mapping[value] for value in l2 if value not in mapping]
print matches
My aim here is two compare lists,they are expected to have same elements.
Works fine
3200000
3200000
3200000
3200000
[]
But problem is that the code is very slow and I will have more calculations later.How to improve this?
My python
Python 2.7.6
This will not be as efficient regarding to memory but VERY efficient regarding execution speed.
It seems like you do not use l3. diff will have everything not contained in both lists.
import itertools
with open('data1.txt','r') as f:
lines = map(int, f)
l1 = itertools.islice(lines, 0, None, 3)
l2 = itertools.islice(lines, 1, None, 3)
diff = set(l1) ^ set(l2)
First, I don't see how that can work:
[mapping[value] for value in l2 if value not in mapping]
I suppose the value is always in mapping and the array is always empty. It should throw an error otherwise since the key will not be found.
Then, try something like this, with no useless memory allocation:
mapping = {}
l2 = []
with open('data1.txt','r') as f:
for i,line in enumerate(f):
v = int(line)
if i % 3 == 0:
mapping[v] = i+1
elif i % 3 == 1:
l2.append(v)
matches = [mapping[value] for value in l2 if value not in mapping] # ??
print(matches)

Increasing Speed of Fuzzy Matching words on two lists

I have a list of about 500 items on one list. I'd like to replace all fuzzy-matched items in that list with the smallest length item.
Is there a way to speed up my implementation of fuzzy match?
Note: I posted a similar question before, but I'm reframing it due to lack of response.
My implementation:
def find_fuzzymatch_samelist(list1, list2, cutoff=90):
"""
#list1 = list(ds1.Title)
#list2 = list(ds1.Title)
"""
matchdict = defaultdict(list)
for i, u in enumerate(list1):
for i1, u1 in enumerate(list2):
#Since list orders are the same, this makes sure this isn't the same item.
if i != i1:
if fuzz.partial_token_sort_ratio(u, u1) >= cutoff:
pair = (u, u1)
#Because there are potential duplicates, I have to make the key constant.
#Otherwise, putting list1 as the key will result in both duplicate items
#serving as the key.
"""
Potential problem:
• what if there are diffrent shortstr?
"""
shortstr = min(pair, key=len)
longstr = max(pair, key=len)
matchdict[shortstr].append(longstr)
return matchdict
I will assume you have installed python-Levenshtein, that will give you a 4x speed up.
Optimising the loop and the dictionary access:
def find_fuzzymatch_samelist(list1, list2, cutoff=90):
matchdict = dict()
for i1, i2 in itertools.permutations(range(len(list1), repeat=2)
u1 = list1[i1]
u2 = list2[i2]
if fuzz.partial_token_sort_ratio(u, u1) >= cutoff:
shortstr = min(u1, u2, key=len)
longstr = max(u1, u2, key=len)
matchdict.get(shortstr, list).append(longstr)
return matchdict
This is as fast as it gets besides the fuzz call. If you read the source, you see that some preprocessing is done for each string, in every iteration. We can do it all at once:
def _asciionly(s):
if PY3:
return s.translate(translation_table)
else:
return s.translate(None, bad_chars)
def full_pre_process(s, force_ascii=False):
s = _asciionly(s)
# Keep only Letters and Numbres (see Unicode docs).
string_out = StringProcessor.replace_non_letters_non_numbers_with_whitespace(s)
# Force into lowercase.
string_out = StringProcessor.to_lower_case(string_out)
# Remove leading and trailing whitespaces.
string_out = StringProcessor.strip(string_out)
out = ''.join(sorted(string_out))
out.strip()
return out
def find_fuzzymatch_samelist(list1, list2, cutoff=90):
matchdict = dict()
if list1 is not list2:
list1 = [full_pre_process(each) for each in list1]
list2 = [full_pre_process(each) for each in list2]
else:
# If you are comparing a list to itself, we don't want to overwrite content.
list1 = [full_pre_process(each) for each in list1]
list2 = list1
for i1, i2 in itertools.permutations(range(len(list1), repeat=2)
u1 = list1[i1]
u2 = list2[i2]
if fuzz.partial_ratio(u, u1) >= cutoff:
pair = (u1, u2)
shortstr = min(pair, key=len)
longstr = max(pair, key=len)
matchdict.get(shortstr, list).append(longstr)
return matchdict

Merge nested list items based on a repeating value

Although poorly written, this code:
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
for i in range(len(marker_array)):
if marker_array[i-1][1] != marker_array[i][1]:
marker_array_DS.append(marker_array[i])
print marker_array_DS
Returns:
[['hard', '2', 'soft'], ['fast', '3'], ['turtle', '4', 'wet']]
It accomplishes part of the task which is to create a new list containing all nested lists except those that have duplicate values in index [1]. But what I really need is to concatenate the matching index values from the removed lists creating a list like this:
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
The values in index [1] must not be concatenated. I kind of managed to do the concatenation part using a tip from another post:
newlist = [i + n for i, n in zip(list_a, list_b]
But I am struggling with figuring out the way to produce the desired result. The "marker_array" list will be already sorted in ascending order before being passed to this code. All like-values in index [1] position will be contiguous. Some nested lists may not have any values beyond [0] and [1] as illustrated above.
Quick stab at it... use itertools.groupby to do the grouping for you, but do it over a generator that converts the 2 element list into a 3 element.
from itertools import groupby
from operator import itemgetter
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
def my_group(iterable):
temp = ((el + [''])[:3] for el in marker_array)
for k, g in groupby(temp, key=itemgetter(1)):
fst, snd = map(' '.join, zip(*map(itemgetter(0, 2), g)))
yield filter(None, [fst, k, snd])
print list(my_group(marker_array))
from collections import defaultdict
d1 = defaultdict(list)
d2 = defaultdict(list)
for pxa in marker_array:
d1[pxa[1]].extend(pxa[:1])
d2[pxa[1]].extend(pxa[2:])
res = [[' '.join(d1[x]), x, ' '.join(d2[x])] for x in sorted(d1)]
If you really need 2-tuples (which I think is unlikely):
for p in res:
if not p[-1]:
p.pop()
marker_array = [['hard','2','soft'],['heavy','2','light'],['rock','2','feather'],['fast','3'], ['turtle','4','wet']]
marker_array_DS = []
marker_array_hit = []
for i in range(len(marker_array)):
if marker_array[i][1] not in marker_array_hit:
marker_array_hit.append(marker_array[i][1])
for i in marker_array_hit:
lists = [item for item in marker_array if item[1] == i]
temp = []
first_part = ' '.join([str(item[0]) for item in lists])
temp.append(first_part)
temp.append(i)
second_part = ' '.join([str(item[2]) for item in lists if len(item) > 2])
if second_part != '':
temp.append(second_part);
marker_array_DS.append(temp)
print marker_array_DS
I learned python for this because I'm a shameless rep whore
marker_array = [
['hard','2','soft'],
['heavy','2','light'],
['rock','2','feather'],
['fast','3'],
['turtle','4','wet'],
]
data = {}
for arr in marker_array:
if len(arr) == 2:
arr.append('')
(first, index, last) = arr
firsts, lasts = data.setdefault(index, [[],[]])
firsts.append(first)
lasts.append(last)
results = []
for key in sorted(data.keys()):
current = [
" ".join(data[key][0]),
key,
" ".join(data[key][1])
]
if current[-1] == '':
current = current[:-1]
results.append(current)
print results
--output:--
[['hard heavy rock', '2', 'soft light feather'], ['fast', '3'], ['turtle', '4', 'wet']]
A different solution based on itertools.groupby:
from itertools import groupby
# normalizes the list of markers so all markers have 3 elements
def normalized(markers):
for marker in markers:
yield marker + [""] * (3 - len(marker))
def concatenated(markers):
# use groupby to iterator over lists of markers sharing the same key
for key, markers_in_category in groupby(normalized(markers), lambda m: m[1]):
# get separate lists of left and right words
lefts, rights = zip(*[(m[0],m[2]) for m in markers_in_category])
# remove empty strings from both lists
lefts, rights = filter(bool, lefts), filter(bool, rights)
# yield the concatenated entry for this key (also removing the empty string at the end, if necessary)
yield filter(bool, [" ".join(lefts), key, " ".join(rights)])
The generator concatenated(markers) will yield the results. This code correctly handles the ['fast', '3'] case and doesn't return an additional third element in such cases.

Categories