Fast way to find number of common items python - python

I have a large data of 145000 items (a bill of materials) and I want to check the % of shared items between two bill of materials.
Two for loops or other methods always run in similar time periods.
What is the fastest way to do this?
First&secondbill are the lists with components in them:
for FKid in FirstBill:
for SKid in SecondBill:
CommonChild = (CommonChild + 1) if FKid == SKid else CommonChild
return CommonChilds / len(FirstBill)

Kinda optimal to use one set
# Python program to illustrate the intersection
# of two lists in most simple way
def intersection(lst1, lst2):
temp = set(lst2)
lst3 = [value for value in lst1 if value in temp ]
return lst3
# Driver Code
lst1 = [4, 9, 1, 17, 11, 26, 28, 54, 69]
lst2 = [9, 9, 74, 21, 45, 11, 63, 28, 26]
#print(intersection(lst1, lst2))
quantity = len(intersection(lst1, lst2))

Assuming that ids in the bills are unique, a simpler answer would be:
percentage = sum([1 for fkid in FirstBill if fkid in SecondBill]) / len(FirstBill) * 100
or
percentage = len(set(FirstBill).intersection(set(SecondBill))) / len(FirstBill) * 100

Related

Fastest method to update all list entries with union of all intersecting entries

I am looking for a fast method to traverse a list of sets, and to expand each set by finding its union with any other element of the list with which it shares at least one element.
For example, suppose that I have four rows of data, where each row corresponds to a set of unique elements
0, 5, 101
8, 9, 19, 21
78, 79
5, 7, 63, 64
The first and the last rows have the intersecting element 5 and so after performing my operation I want to have the unions
0, 5, 7, 63, 64, 101
8, 9, 19, 21
78, 79
0, 5, 7, 63, 64, 101
Right now, I can nearly do this with two loops:
def consolidate_list(arr):
"""
arr (list) : A list of lists, where the inner lists correspond to sets of unique integers
"""
arr_out = list()
for item1 in arr:
item_additional = list() # a list containing all overlapping elements
for item2 in arr:
if len(np.intersect1d(item1, item2)) > 0:
item_additional.append(np.copy(item2))
out_val = np.unique(np.hstack([np.copy(item1)] + item_additional)) # find union of all lists
arr_out.append(out_val)
return arr_out
The issue with this approach is that it needs to be run multiple times, until the output stops changing. Since the input might be jagged (ie, different numbers of elements per set), I can't see a way to vectorize this function.
This problem is about creating disjoint sets and so I would use union-find methods.
Now Python is not particularly known for being fast, but for the sake of showing the algorithm, here is an implementation of a DisjointSet class without libraries:
class DisjointSet:
class Element:
def __init__(self):
self.parent = self
self.rank = 0
def __init__(self):
self.elements = {}
def find(self, key):
el = self.elements.get(key, None)
if not el:
el = self.Element()
self.elements[key] = el
else: # Path splitting algorithm
while el.parent != el:
el, el.parent = el.parent, el.parent.parent
return el
def union(self, key=None, *otherkeys):
if key is not None:
root = self.find(key)
for otherkey in otherkeys:
el = self.find(otherkey)
if el != root:
# Union by rank
if root.rank < el.rank:
root, el = el, root
el.parent = root
if root.rank == el.rank:
root.rank += 1
def groups(self):
result = { el: [] for el in self.elements.values()
if el.parent == el }
for key in self.elements:
result[self.find(key)].append(key)
return result
Here is how you could use it for this particular problem:
def solve(lists):
disjoint = DisjointSet()
for lst in lists:
disjoint.union(*lst)
groups = disjoint.groups()
return [lst and groups[disjoint.find(lst[0])] for lst in lists]
Example call:
data = [
[0, 5, 101],
[8, 9, 19, 21],
[],
[78, 79],
[5, 7, 63, 64]
]
result = solve(data)
The result will be:
[[0, 5, 101, 7, 63, 64], [8, 9, 19, 21], [], [78, 79], [0, 5, 101, 7, 63, 64]]
Note that I added an empty list in the input list, so to illustrate that this boundary case remains unaltered.
NB: There are libraries out there that provide union-find/disjoint set functionality, each with a slightly different API, but I suppose that using one of those can give a better performance.
Do you mean by?:
from itertools import combinations
l1 = [0, 5, 7, 63, 64, 101]
l2 = [8, 9, 19]
l3 = [78, 79]
l4 = [5, 4, 34]
print([v for x, y in combinations([l1, l2, l3, l4], 2) for v in {*x} & {*y}])
Output:
[5]

sum of surrounding elements in a list

I'm writing a code which calculates the sum of the numbers beside it.
For example, list1 = [10, 20, 30, 40, 50], the new list = [30 (10+20), 60 (10+20+30), 90 (20+30+40), 120 (30+40+50), 90 (40+50)]. => final list = [30, 60, 90, 120, 90].
At the moment my idea was of using a for loop but it was totally off.
You can do it by creating triplets using zip:
# pad for first and last triplet
lst = [0] + original + [0]
# summarize triplets
sums = [sum(triplet) for triplet in zip(lst, lst[1:], lst[2:])]
Example:
>>> original = [10, 20, 30, 40, 50]
>>> lst = [0] + original + [0]
>>> sums = [sum(triplet) for triplet in zip(lst, lst[1:], lst[2:])]
>>> sums
[30, 60, 90, 120, 90]
>>>
Check out this guy's flatten function What is the fastest way to flatten arbitrarily nested lists in Python?
Take the result of the flattened lists of lists and sum the collection normally with a for loop, or a library that provides a count utility for collections.

Perform operation on every array element pair quickly

I have an array A
A = [5,2,8,14,6,13]
I want to get an array where each element is added to every other element, so the first five elements would be 5 + each element, then the next four would be 2 + each element etc.
So the result would be
B = [7,13,19,11,18, 10,16,8,15, 22,14,21, 20,27, 19]
What is the quickest way to do this without using for loops?
Note: The problem I am trying to solve involves large boolean arrays instead of integers and the actual operation is a boolean 'and', not merely addition. I have simplified the question for ease of explanation. I have been using for loops up to now, but I am looking for a faster alternative.
Use ` itertools.combinations
from itertools import combinations
a = [5,2,8,14,6,13]
print [sum(i) for i in list(combinations(a, 2))]
No need of list(). Thanks to #PeterWood
print [sum(i) for i in combinations(a, 2)]
Output:
[7, 13, 19, 11, 18, 10, 16, 8, 15, 22, 14, 21, 20, 27, 19]
Demo
You could do it recursively:
def add_value_to_rest(sequence):
if not sequence:
return []
else:
additional = sequence[0]
return ([additional + value for value in sequence] +
add_value_to_rest(sequence[1:]))
With generators, in Python 3:
def add_value_to_rest(sequence):
if sequence:
additional = sequence[0]
for value in sequence:
yield additional + value
yield from add_value_to_rest(sequence[1:])
Or with Python 2.7:
def add_value_to_rest(sequence):
if sequence:
additional = sequence[0]
for value in sequence:
yield additional + value
for value in add_value_to_rest(sequence[1:]):
yield value
A = [5,2,8,14,6,13]
B = []
for i, x in enumerate(A):
for l in range(len(A) - i - 1):
B.append(A[i] + A[i + l + 1])
print B
#[7, 13, 19, 11, 18, 10, 16, 8, 15, 22, 14, 21, 20, 27, 19]

How to find the maximum number in a list using a loop?

So I have this list and variables:
nums = [14, 8, 9, 16, 3, 11, 5]
big = nums[0]
spot = 0
I'm confused about how to actually do it. I want to use this exercise to give me a starter. How do I do that on Python?
Usually, you could just use
max(nums)
If you explicitly want to use a loop, try:
max_value = None
for n in nums:
if max_value is None or n > max_value: max_value = n
Here you go...
nums = [14, 8, 9, 16, 3, 11, 5]
big = max(nums)
spot = nums.index(big)
This would be the Pythonic way of achieving this. If you want to use a loop, then loop with the current max value and check if each element is larger, and if so, assign to the current max.
nums = [14, 8, 9, 16, 3, 11, 5]
big = None
spot = None
for i, v in enumerate(nums):
if big is None or v > big:
big = v
spot = i
Python already has built in function for this kind of requirement.
list = [3,8,2,9]
max_number = max(list)
print (max_number) # it will print 9 as big number
however if you find the max number with the classic vay you can use loops.
list = [3,8,2,9]
current_max_number = list[0]
for number in list:
if number>current_max_number:
current_max_number = number
print (current_max_number) #it will display 9 as big number
Why not simply using the built-in max() function:
>>> m = max(nums)
By the way, some answers to similar questions might be useful:
Pythonic way to find maximum value and its index in a list?
How to find all positions of the maximum value in a list?
To address your second question, you can use a for loop:
for i in range(len(list)):
# do whatever
You should note that range() can have 3 arguments: start, end, and step. Start is what number to start with (if not supplied, it is 0); start is inclusive.. End is where to end at (this has to be give); end is exclusive: if you do range(100), it will give you 0-99. Step is also optional, it means what interval to use. If step is not provided, it will be 1. For example:
>>> x = range(10, 100, 5) # start at 10, end at 101, and use an interval of 5
>>> x
[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95] # note that it does not hit 100
Since end is exclusive, to include 100, we could do:
>>> x = range(10, 101, 5) # start at 10, end at 101, and use an interval of 5
>>> x
[10, 15, 20, 25, 30, 35, 40, 45, 50, 55, 60, 65, 70, 75, 80, 85, 90, 95, 100] # note that it does hit 100
For the Max in List Code HS I've managed to get most of the auto grader to work for me using this code:
list = [-3,-8,-2,0]
current_max_number = list[0]
for number in list:
if number>current_max_number:
current_max_number = number
print current_max_number
def max_int_in_list():
print "Here"
I'm not sure where the max_int_in_list goes though. It needs to have exactly 1 parameter.
To print the Index of the largest number in a list.
numbers = [1,2,3,4,5,6,9]
N = 0
for num in range(len(numbers)) :
if numbers[num] > N :
N = numbers[num]
print(numbers.index(N))
student_scores[1,2,3,4,5,6,7,8,9]
max=student_scores[0]
for n in range(0,len(student_scores)):
if student_scores[n]>=max:
max=student_scores[n]
print(max)
# using for loop to go through all items in the list and assign the biggest value to a variable, which was defined as max.
min=student_scores[0]
for n in range(0,len(student_scores)):
if student_scores[n]<=min:
min=student_scores[n]
print(min)
# using for loop to go through all items in the list and assign the smallest value to a variable, which was defined as min.
Note: the above code is to pick up the max and min by using for loop, which can be commonly used in other programming languages as well. However, the max() and min() functions are the easiest way to use in Python to get the same results.
I would add this as a reference too. You can use the sort and then print the last number.
nums = [14, 8, 9, 16, 3, 11, 5]
nums.sort()
print("Highest number is: ", nums[-1])
scores = [12, 13, 14, 15, 16, 17, 18, 19, 21, 22, 23, 24, 25, 26, 27,
28, 29, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 31, 31, 37,
56, 75, 23, 565]
# initialize highest to zero
highest = 0
for mark in scores:
if highest < mark:
highest = mark
print(mark)

Python: Partial sum of numbers [duplicate]

This question already has answers here:
How to find the cumulative sum of numbers in a list?
(25 answers)
Closed 8 years ago.
can you help me with code which returns partial sum of numbers in text file?
I must import text file, then make a code for partial sums without tools ..etc.
My input:
4
13
23
21
11
The output should be (without brackets or commas):
4
17
40
61
72
I was trying to make code in python, but could only do total sum and not partial one.
If i use the += operator for generator, it gives me an error!
Well, since everyone seems to be giving their favourite idiom for solving the problem, how about itertools.accumulate in Python 3:
>>> import itertools
>>> nums = [4, 13, 23, 21, 11]
>>> list(itertools.accumulate(nums))
[4, 17, 40, 61, 72]
There are a number of ways to create your sequence of partial sums. I think the most elegant is to use a generator.
def partial_sums(iterable):
total = 0
for i in iterable:
total += i
yield total
You can run it like this:
nums = [4, 13, 23, 21, 11]
sums = list(partial_sums(nums)) # [ 4, 17, 40, 61, 72]
Edit To read the data values from your file, you can use another generator, and chain them together. Here's how I'd do it:
with open("filename.in") as f_in:
# Sums generator that "feeds" from a generator expression that reads the file
sums = partial_sums(int(line) for line in f_in)
# Do output:
for value in sums:
print(value)
# If you need to write to a file, comment the loop above and uncomment this:
# with open("filename.out", "w") as f_out:
# f_out.writelines("%d\n" % value for value in sums)
numpy.cumsum will do what you want.
If you're not using numpy, you can write your own.
def cumsum(i):
s = 0
for elt in i:
s += elt
yield s
try this:
import numpy as np
input = [ 4, 13, 23, 21, 11 ]
output = []
output.append(input[0])
for i in np.arange(1,len(input)):
output.append(input[i] + input[i-1])
print output
Use cumulative sum in numpy:
import numpy as np
input = np.array([4, 13, 23, 21 ,11])
output = input.cumsum()
Result:
print output
>>>array([ 4, 17, 40, 61, 72])
Or if you need a list, you may convert output to list:
output = list(output)
print output
>>>[4, 17, 40, 61, 72]
This is an alternative solution using reduce:
nums = [4, 13, 23, 21, 11]
partial_sum = lambda a, b: a + [a[-1] + b]
sums = reduce(partial_sum, nums[1:], nums[0:1])
Pluses in lambda are not the same operator, the first one is list concatenation and the second one is sum of two integers. Altough Blckknght's may be more clear, this one is shorter and works in Python 2.7.
something like this:
>>> lst = [4, 13, 23, 21 ,11]
>>> [sum(lst[:i+1]) for i, x in enumerate(lst)]
[4, 17, 40, 61, 72]

Categories