How to write faster Python code? - python

My code
with open('data1.txt','r') as f:
lst = [int(line) for line in f]
l1=lst[::3]
l2=lst[1::3]
l3=lst[2::3]
print len(l1)
print len(l2)
print len(l3)
b = []
for i in range(3200000):
b.append(i+1)
print len(b)
mapping = dict(zip(l1, b))
matches = [mapping[value] for value in l2 if value not in mapping]
print matches
My aim here is two compare lists,they are expected to have same elements.
Works fine
3200000
3200000
3200000
3200000
[]
But problem is that the code is very slow and I will have more calculations later.How to improve this?
My python
Python 2.7.6

This will not be as efficient regarding to memory but VERY efficient regarding execution speed.
It seems like you do not use l3. diff will have everything not contained in both lists.
import itertools
with open('data1.txt','r') as f:
lines = map(int, f)
l1 = itertools.islice(lines, 0, None, 3)
l2 = itertools.islice(lines, 1, None, 3)
diff = set(l1) ^ set(l2)

First, I don't see how that can work:
[mapping[value] for value in l2 if value not in mapping]
I suppose the value is always in mapping and the array is always empty. It should throw an error otherwise since the key will not be found.
Then, try something like this, with no useless memory allocation:
mapping = {}
l2 = []
with open('data1.txt','r') as f:
for i,line in enumerate(f):
v = int(line)
if i % 3 == 0:
mapping[v] = i+1
elif i % 3 == 1:
l2.append(v)
matches = [mapping[value] for value in l2 if value not in mapping] # ??
print(matches)

Related

Searching for an exact values in a Dictionary

I have a Dictionary and I want to search for the keys which correspond to a particular value I need (S009 and S007 in this case)
I wrote the following code but I get nothing from it
Here is my Code:
def find():
L = [{"V": "S001"},
{"V": "S002"},
{"V": "S001"},
{"V": "S001"},
{"V": "S001"},
{"V1": "S002"},
{"V111": "S005"},
{"V2": "S005"},
{"V": "S009"},
{"V3": "S007"}]
L1 = []
for y in range (len(L)) :
for j in L[y].values():
L1.append(j)
L2=[]
for z in L1:
if z not in L2:
L2.append(z)
count =0
l3=[]
s = set(L1)
for z in L2:
for y in L1:
if z in L2:
count =count +1
if count == 2:
l3.append(z)
for s in l3:
print(s)
def main():
find()
main()
My code explained: First, I took all the values in a list and called it L1. Then I get all the values without being copied in L2. Then, I want to search if an element of L2 exists in L1. After this loop, if the count became only one so this is the value I'm looking for & I append it to an empty list called l3
You can do it in two steps. First extract all the values from L:
values = []
for i in L:
for v in i.values():
values.append(v)
Or as a list comprehension:
values = [v for i in L for v in i.values()]
Then filter out the items with count more than 1:
result = [i for i in values if values.count(i) == 1]
print (result)
Result:
['S009', 'S007']
What you've defined above as L is a list of individual dictionaries. I'm not sure this is what was intended. You said you're expected output should be 's009' and 's007', so I'm going to assume that, perhaps, you intended L to just be a list of the values of each individual dictionary. In that's the case,
L = ["S001", "S002", "S001", "S001", "S001", "S002", "S005", "S005", "S009", "S007"]
One of the easiest ways to count the number of items of a list is to use a Counter from the collections module.
Then just create the Counter with the L as the only argument
from collections import Counter
c = Counter(L)
print(c)
Counter({'S001': 4, 'S002': 2, 'S005': 2, 'S009': 1, 'S007': 1})
Now you can see how many instances of each element of L exist. From there you can just use a little list comprehension to filter out anything that doesn't have one instance.
result = [key for key, value in c.items() if value == 1]
print(result)
['S009', 'S007']
All the code:
from collections import Counter
L = ["S001", "S002", "S001", "S001", "S001", "S002", "S005", "S005", "S009", "S007"]
c = Counter(L)
result = [key for key, value in c.items() if value == 1]

How to avoid case sensitivity in Python

I have this code and want to compare two lists.
list2= [('Tom','100'),('Alex','200')]
list3= [('tom','100'),('alex','200')]
non_match = []
for line in list2:
if line not in list3:
non_match.append(line)
print(non_match)
The results will be:
[('Tom', '100'), ('Alex', '200')]
because of case sensitivity! is there any way to avoid the case sensitivity in this case? I don't want to change the lists to upper or lower case.
Or any other method which can match these lists?
Using lower to convert the tuple to lower case for comparison
list2= [('Tom','100'),('Alex','200')]
list3= [('tom','100'),('alex','200')]
non_match = []
for line in list2:
name, val = line
if (name.lower(), val) not in list3:
non_match.append(line)
print(non_match)
You can't avoid transforming your data to some case-insensitive format, at some point.
What you can do is to avoid recreating the full lists:
def make_canonical(line):
name, number = line
return (name.lower(), number)
non_match = []
for line2 in list2:
search = make_canonical(line2)
for line3 in list3:
canonical = make_canonical(line3)
if search == canonical:
break
else:
# Did not hit the break
non_match.append(line3)
You need to iterate tuples also inside loop
for line2 in list2:
for line3 in list3:
if len(line3) == len(line2):
lenth = len(line3)
successCount = 0
match = False
for i in range(lenth):
result = line2[i].lower() == line3[i].lower()
if result == True:
successCount = successCount +1;
result = False
if successCount == lenth:
non_match.append(line2)
print(non_match)
enjoy.....
You can make the comparison even more generic mixing ints and whitespaces in the game by creating two dicts from your tuple-lists and compare the lists:
def unify(v):
return str(v).lower().strip()
list2= [('Tom ','100'),(' AleX',200)]
list3= [('toM',100),('aLex ','200')]
d2 = {unify(k):unify(v) for k,v in list2} # create a dict
d3 = {unify(k):unify(v) for k,v in list3} # create another dict
print(d2 == d3) # dicts compare (key,value) wise
The applied methods will make strings from integers, strip whitespaces and then compare the dicts.
Output:
True
This worked for me! Both lists will be converted to lower case.
list2= [('Tom','100'),('Alex','200'),('Tom', '13285')]
list3= [('tom','100'),('ALex','200'),('Tom', '13285')]
def make_canonical(line):
name, number = line
return (name.lower(), number)
list22 = []
for line2 in list2:
search = make_canonical(line2)
list22.append(search)
list33 =[]
for line3 in list3:
search1 = make_canonical(line3)
list33.append(search1)
non_match = []
for line in list22:
if line not in list33:
non_match.append(line)
print(non_match)

Grouping the nested attribute list in Python

I have a list
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
How can I group the list by the initial three lines, so that in the end it's like this. Grouping occurs on three characters of the line. If the line starts with "orb", then subsequent lines are added to the list that begins with this line. Thanks for the answer.
result = [['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']]
Here is an algorithm of O(N) complexity:
res = []
tmp = []
for x in lst:
if x.startswith('orb'):
if tmp:
res.append(tmp)
tmp = [x]
elif tmp:
tmp.append(x)
res.append(tmp)
result:
In [133]: res
Out[133]:
[['orb|2|3|4', 'obx|2|3|4'],
['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]
You can use itertools.groupby:
import itertools, re
lst = ['orb|2|3|4', 'obx|2|3|4', 'orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3','obx|1|2|3']
new_result = [list(b) for _, b in itertools.groupby(lst, key=lambda x:re.findall('^\w+', x)[0])]
final_result = [new_result[i]+new_result[i+1] for i in range(0, len(new_result), 2)]
Output:
[['orb|2|3|4', 'obx|2|3|4'], ['orb|2|3|4', 'obx|1|2|3', 'obx|1|2|3', 'obx|1|2|3']]

Creating list checker in Python. Compares lists and provides difference between them. Struggling with tail end of script

I am writing a short script that is suppose to look at a master list and a (for lack of better term) slave list, and output inconsistencies in the slave list when compared to the master list. I am very much a novice when it comes to coding..... Any insight would be much appreciated.
The data I'm trying to sort are in text files. Look something like the following:
12345678-Bananas
23456789-Apples
12345678-StarWars
23456789-RedBall
11223344-RedRover
22334455-Rabbit
I sort the data into lists (I think) using the follow bit of code:
filename = utils.SelectOpenFile(0, "Text file (*.txt)")
print (filename[0])
if not filename[0]:
return
text_file = open(filename[0], "r")
Part_List = text_file.read( ).splitlines( )
PN_List = [ ]
index = 0
for line in Part_List:
PN_List.append (Part_List[index][:8])
PN_List[index] = PN_List[index].lower( )
index += 1
List1 = sorted(list(Counter(PN_List).items( )))
Which nets me:
List1 = [('12345678',2),('23456789',2),('11223344',1),('22334455',1)]
then compare it to another list that will either be identical or slightly different, like so:
List2 = [('12345678',3),('23456789',1),('11223344',1)]
I'm trying to see if List2 matches the "master" List1. What I would like as an output would be:
List3 = [('12345678',-1),('23456789',1),('22334455',1)]
So far have tried the following:
for x in List2:
if x[0] in List1:
if x[1] < x[1] in List1:
print ("Too Few Parts")
elif x[1] > x[1] in List1:
print ("Too Many Parts")
else:
print ("Perfecto")
else:
print ("Extra Part")
for y in List1:
if y[0] in List2:
return
else:
print ("Missing Part")
open to better solutions
I take it that
List1 = [('12345678',2),('23456789',2),('11223344',1),('22334455',1)]
represents a set of part numbers and a quantity on hand. You are finding this difficult because your data structure doesn't match the problem you are trying to solve. Convert the master list to a dictionary:
>>> master = dict(List1)
>>> master
{'12345678': 2, '23456789': 2, '11223344': 1, '22334455': 1}
and do the same with the slave list:
>>> slave = dict(List2)
Now you will have a much easier time of matching up the part numbers.
>>> result = master.copy()
>>> for k,v in slave.items():
if k in master:
result[k] -= v
else:
print (f"Missing part {k}")
Now convert the resulting dictionary back to a list of (part-number, quantity) tuples. You could just do List3 = list(result.items()) but it appears you want to drop a part number from the master list if the quantity goes to zero.
>>> List3 = [(k,v) for (k,v) in result.items() if v != 0]
>>> List3
[('12345678', -1), ('23456789', 1), ('22334455', 1)]
You can use Counter for this too, but first you need to do it the hard way so that you understand what Counter is doing. The solution is identical. The only difference is that Counter does the subtracting for you in one function call.
>>> from collections import Counter
>>> c1 = Counter(dict(List1))
>>> c2 = Counter(dict(List2))
>>> c3=c1.copy()
>>> c3.subtract(c2)
>>> c3
Counter({'23456789': 1, '22334455': 1, '11223344': 0, '12345678': -1})
>>> List3 = [(k,v) for (k,v) in c3.items() if v != 0]
>>> List3
[('12345678', -1), ('23456789', 1), ('22334455', 1)]

I want to write a function that takes a list and returns a count of total number of duplicate elements in the list

I have tried this, for some unknown reason when it prints h, it prints None, so i thought if it counts the number of None printed then divided by 2 it will give the number of duplicates, but i cant use function count here
a= [1,4,"hii",2,4,"hello","hii"]
def duplicate(L):
li=[]
lii=[]
h=""
for i in L:
y= L.count(i)
if y>1:
h=y
print h
print h.count(None)
duplicate(a)
Use the Counter container:
from collections import Counter
c = Counter(['a', 'b', 'a'])
c is now a dictionary with the data: Counter({'a': 2, 'b': 1})
If you want to get a list with all duplicated elements (with no repetition), you can do as follows:
duplicates = filter(lambda k: c[k] > 1, c.iterkeys())
If you want to only count the duplicates, you can then just set
duplicates_len = len(duplicates)
You can use a set to get the count of unique elements, and then compare the sizes - something like that:
def duplicates(l):
uniques = set(l)
return len(l) - len(uniques)
i found an answer which is
a= [1,4,"hii",2,4,"hello",7,"hii"]
def duplicate(L):
li=[]
for i in L:
y= L.count(i)
if y>1:
li.append(i)
print len(li)/2
duplicate(a)
the answer by egualo is much better, but here is another way using a dictionary.
def find_duplicates(arr):
duplicates = {}
duplicate_elements = []
for element in arr:
if element not in duplicates:
duplicates[element] = False
else:
if duplicates[element] == False:
duplicate_elements.append(element)
duplicates[element] = True
return duplicate_elements
It's pretty simple and doesn't go through the lists twice which is kind of nice.
>> test = [1,2,3,1,1,2,2,4]
>> find_duplicates(test)
[1, 2]

Categories