Two value in two lists, simplify the code - python

I have two lists, and I want to compare the value in each list to see if the difference is in a certain range, and return the number of same value in each list. Here is my code 1st version:
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k=0
for i in xrange(0, len(m)):
for j in xrange(0, len(n)):
if abs(m[i] - n[j]) <=0.5:
k+=1
else:
continue
the output is 3. I also tried 2nd version:
for i, j in zip(m,n):
if abs(i - j) <=0.5:
t+=1
else:
continue
the output is 1, the answer is wrong. So I am wondering if there is simpler and more efficient code for the 1st version, I have a big mount of data to deal with. Thank you!

The first thing you could do is remove the else: continue, since that doesn't add anything. Also, you can directly use for a in m to avoid iterating over a range and indexing.
If you wanted to write it more succiently, you could use itertools.
import itertools
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k = sum(abs(a - b) <= 0.5 for a, b in itertools.product(m, n))
The runtime of this (and your solution) is O(m * n), where m and n are the lengths of the lists.
If you need a more efficient algorithm, you can use a sorted data structure like a binary tree or a sorted list to achieve better lookup.
import bisect
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
n.sort()
k = 0
for a in m:
i = bisect.bisect_left(n, a - 0.5)
j = bisect.bisect_right(n, a + 0.5)
k += j - i
The runtime is O((m + n) * log n). That's n * log n for sorting and m * log n for lookups. So you'd want to make n the shorter list.

More pythonic version of your first version:
ms = [1, 3, 5, 7]
ns = [1, 4, 7, 9, 5, 6, 34, 52]
k = 0
for m in ms:
for n in ns:
if abs(m - n) <= 0.5:
k += 1
I don't think it will run faster, but it's simpler (more readable).

It's simpler, and probably slightly faster, to simply iterate over the lists directly rather than to iterate over range objects to get index values. You already do this in your second version, but you're not constructing all possible pairs with that zip() call. Here's a modification of your first version:
m = [1,3,5,7]
n = [1,4,7,9,5,6,34,52]
k=0
for x in m:
for y in n:
if abs(x - y) <=0.5:
k+=1
You don't need the else: continue part, which does nothing at the end of a loop, so I left it out.
If you want to explore generator expressions to do this, you can use:
k = sum(sum( abs(x-y) <= 0.5 for y in n) for x in m)
That should run reasonably fast using just the core language and no imports.

Your two code snippets are doing two different things. The first one is comparing each element of n with each element of m, but the second one is only doing a pairwise comparison of corresponding elements of m and n, stopping when the shorter list runs out of elements. We can see exactly which elements are being compared in the second case by printing the zip:
>>> m = [1,3,5,7]
>>> n = [1,4,7,9,5,6,34,52]
>>> zip(m,n)
[(1, 1), (3, 4), (5, 7), (7, 9)]
pawelswiecki has posted a more Pythonic version of your first snippet. Generally, it's better to directly iterate over containers rather than using an indexed loop unless you actually need the index. And even then, it's more Pythonic to use enumerate() to generate the index than to use xrange(len(m)). Eg
>>> for i, v in enumerate(m):
... print i, v
...
0 1
1 3
2 5
3 7
A rule of thumb is that if you find yourself writing for i in xrange(len(m)), there's probably a better way to do it. :)
William Gaul has made a good suggestion: if your lists are sorted you can break out of the inner loop once the absolute difference gets bigger than your threshold of 0.5. However, Paul Draper's answer using bisect is my favourite. :)

Related

What is the math program I'm trying to solve in python?

I am trying to solve this math problem in python, and I'm not sure what it is called:
The answer X is always 100
Given a list of 5 integers, their sum would equal X
Each integer has to be between 1 and 25
The integers can appear one or more times in the list
I want to find all the possible unique lists of 5 integers that match.
These would match:
20,20,20,20,20
25,25,25,20,5
10,25,19,21,25
along with many more.
I looked at itertools.permutations, but I don't think that handles duplicate integers in the list. I'm thinking there must be a standard math algorithm for this, but my search queries must be poor.
Only other thing to mention is if it matters that the list size could change from 10 integers to some other length (6, 24, etc).
This is a constraint satisfaction problem. These can often be solved by a method called linear programming: You fix one part of the solution and then solve the remaining subproblem. In Python, we can implement this approach with a recursive function:
def csp_solutions(target_sum, n, i_min=1, i_max=25):
domain = range(i_min, i_max + 1)
if n == 1:
if target_sum in domain:
return [[target_sum]]
else:
return []
solutions = []
for i in domain:
# Check if a solution is still possible when i is picked:
if (n - 1) * i_min <= target_sum - i <= (n - 1) * i_max:
# Construct solutions recursively:
solutions.extend([[i] + sol
for sol in csp_solutions(target_sum - i, n - 1)])
return solutions
all_solutions = csp_solutions(100, 5)
This yields 23746 solutions, in agreement with the answer by Alex Reynolds.
Another approach with Numpy:
#!/usr/bin/env python
import numpy as np
start = 1
end = 25
entries = 5
total = 100
a = np.arange(start, end + 1)
c = np.array(np.meshgrid(a, a, a, a, a)).T.reshape(-1, entries)
assert(len(c) == pow(end, entries))
s = c.sum(axis=1)
#
# filter all combinations for those that meet sum criterion
#
valid_combinations = c[np.where(s == total)]
print(len(valid_combinations)) # 23746
#
# filter those combinations for unique permutations
#
unique_permutations = set(tuple(sorted(x)) for x in valid_combinations)
print(len(unique_permutations)) # 376
You want combinations_with_replacement from itertools library. Here is what the code would look like:
from itertools import combinations_with_replacement
values = [i for i in range(1, 26)]
candidates = []
for tuple5 in combinations_with_replacement(values, 5):
if sum(tuple5) == 100:
candidates.append(tuple5)
For me on this problem I get 376 candidates. As mentioned in the comments above if these are counted once for each arrangement of the 5-pair, then you'd want to look at all, permutations of the 5 candidates-which may not be all distinct. For example (20,20,20,20,20) is the same regardless of how you arrange the indices. However, (21,20,20,20,19) is not-this one has some distinct arrangements.
I think that this could be what you are searching for: given a target number SUM, a left treshold L, a right treshold R and a size K, find all the possible lists of K elements between L and R which sum gives SUM. There isn't a specific name for this problem though, as much as I was able to find.

Python list with zeros and numbers repeating

My intent is to generate a generic empty list and then to append it with a numeric sequence such that it gives zeros but at the third place it gives 3 and then its multiples, that is a[(0,0,3,0,0,6,0,0,9)] and it needs to have 100 values inside.
I first set the list and then I use the 'for' loop in a range(0,100) and I am sure I need to use % in ways such that whenever my sequence going 1 to 100 is perfectly divisible by 3 it gives back 3 (and not 0) but then it keeps evolving in 6,9,12.. How do I do?
for i in range(0,100):
if i%3==0:
return 0
else
return 3
Of course this is completely wrong but i am new to programming in general. Thanks in advance.
you could try this:
for i in range(0, 100, 3):
list[i]=i
You just change the "step" of the range function, so the index i will represent also the value passed in the list, it should work properly.
#Mark's comment is very relevant and makes good use of the modulo and list comprehension properties in a very simple way. Moreover, its code easily adapts to any value.
However the modulo operator is quite slow so here are other ways to achieve the same result.
Method 1
We can make a range from 3 to 101 with a step of 3, and add [0, 0, i] to each step. Since there will be missing zeros at the end of the list, we must add as many as the rest of the division of 100 by 3.
data = [num for i in range(3, 101, 3) for num in [0, 0, i]] + [0] * 1
Method 2
With the same idea, we can use .extend() to add two 0s before each term.
data = []
for i in range(3, 101, 3):
data.extend([0, 0, i])
data.append(0)
Method 3
The simplest idea, we create a list of 100 zeros, and then we modify the value every 3 terms.
data = [0] * 100
for i in range(2, 101, 3):
data[i] = i + 1
Comparison
Using timeit, here is a comparison of the speed of each algorithm, the comparison is based on 10000 repetitions.
import timeit
print(timeit.timeit("data = [0 if n % 3 else n for n in range(1, 101)]"))
print(timeit.timeit("data = [num for i in range(3, 101, 3) for num in [0, 0, i]] + [0] * 1"))
print(timeit.timeit("""
data = []
for i in range(3, 101, 3):
data.extend([0, 0, i])
data.append(0)
"""))
print(timeit.timeit("""
data = [0] * 100
for i in range(2, 101, 3):
data[i] = i + 1
"""))
Output:
4.137781305000317
3.8176420609997876
2.4403464719998738
1.4861199529996156
The last algorithm is thus the fastest, almost 3 times faster than using modulus.
A function stops running, when it encounters a return. So in your code, you only ever execute the loop once.
But, what if we could change that? Do you know what a generator is? Have a look at this:
def multiples_of_three_or_zero():
for i in range(0,100):
if i%3==0:
yield 0
else
yield 3
That's a generator. yield doesn't end the execution, but rather suspends it. You use it like this:
for i in multiples_of_three_or_zero():
print(i)
Or, if you really want all the elements in a list, just make a list from it:
list(multiples_of_three_or_zero())
Ok I practiced a little bit and I found this solution which best suits my needs (it was for a take-home exercise)
A=[]
for a in range (1,101):
if a%3==0:
A.append(a)
else:
A.append(0)
print(A)
thanks you all!

How to create a function with a variable number of 'for' loops, each with a distinct index?

The Problem:
Consider a d-dimensional simple cubic lattice.
If the lattice has width L, then the number of lattice sites is Ld. I want to create a list that contains all the positions of the lattice sites, for a general d and L.
For example, when L = 2 and d = 2 this would be [(0, 0), (1, 0), (0, 1), (1, 1)].
My Attempt:
Whilst I can do this for general L, I have not been able to generalise the dimension d.
Below is my solution for d = 3 using three for loops.
def Positions(L):
PositionList = []
for i in range(L):
for j in range(L):
for k in range(L):
PositionList.append([k, j, i])
return PositionList
It's easy to see how I would change it to increase or decrease the dimension d as I could simply add or remove the for loops, but obviously it is very tedious to write this out for large d.
I thought about using a recursive for loop, so I would only be using one for loop for any d, but I don't know how to do this whilst preserving the indexing I need to write out the positions.
In Conclusion:
Is it possible to have a variable number of for loops, each one having a distinct index to solve this problem?
Or is there a better solution without using for loops?
Recursion is indeed the way to go
The idea is:
If you assume your function works for d-1 dimensions, then you can take that result and append to each of the results the value of i (the loop variable), and do that repeatedly for each value of i.
The base case is when d=0, in that case you have just a single, empty result.
Here is how that can be coded:
def Positions(L, d):
if d == 0: # base case
return [[]]
return [
[i] + res # prepend i to the results coming from recursion
for i in range(L)
for res in Positions(L, d-1)
]
If you are not familiar with the list-comprehension syntax used in the final statement, then here is how you would do it without that syntax:
def Positions(L, d):
if d == 0: # base case
return [[]]
positions = []
for i in range(L):
for res in Positions(L, d-1):
positions.append([i] + res)
return positions
One easy way is using itertools cartesian product:
from itertools import product
L, D = 2, 2
print(list(product(list(range(L)), repeat = D)))
Result
[(0, 0), (0, 1), (1, 0), (1, 1)]
you use recursion. the first part is the base case and the second part is to add every number from 0 to L-1 for every term in the lattice for the lower dimension
def positions(L,d):
if d==0:
return [()]
else:
return [(x,)+positions(L,d-1)[y] for x in range(L) for y in range(len(positions(L,d-1)))]

You have a list of integers, and for each index you want to find the product of every integer except the integer at that index. (Can't use division)

I am currently working on practice interview questions. Attached is a screenshot of the question I'm working on below
I tried with the brute force approach of using nested loops with the intent of refactoring out the nested loop, but it still failed the tests in the brute force approach.
Here is the code that I tried:
def get_products_of_all_ints_except_at_index(int_list):
# Make a list with the products
products = []
for i in int_list:
for j in int_list:
if(i != j):
k = int_list[i] * int_list[j]
products.append(k)
return products
I am curious to both the brute force solution, and the more efficient solution without using nested loops.
Linear algorithm using cumulative products from the right side and from the left side
def productexcept(l):
n = len(l)
right = [1]*n
for i in reversed(range(n-1)):
right[i] = right[i + 1] * l[i+1]
#print(right)
prod = 1
for i in range(n):
t = l[i]
l[i] = prod * right[i]
prod *= t
return l
print(productexcept([2,3,7,5]))
>> [105, 70, 30, 42]
If you are allowed to use imports and really ugly list comprehensions you could try this:
from functools import reduce
l = [1,7,3,4]
[reduce(lambda x,y: x*y, [l[i] for i in range(len(l)) if i != k],1) for k,el in enumerate(l)]
If you are not allowed to use functools you can write your own function:
def prod(x):
prod = 1
for i in x:
prod = prod * i
return prod
l = [1,7,3,4]
[prod([l[i] for i in range(len(l)) if i != k]) for k,el in enumerate(l)]
I leave it as an exercise to the reader to put the two solutions in a function.
If you want to get the index of each elements in a list, you should try for i in range(len(int_list)). for i in int_list is in fact returning the values in the list but not the index.
So the brute force solution should be:
def get_products_of_all_ints_except_at_index(int_list):
# Make a list with the products
products = []
for i in range(len(int_list)):
k = 1
for j in range(len(int_list)):
if(i != j):
k *= int_list[j]
products.append(k)
return products
I've come up with this:
def product(larg):
result = 1
for n in larg:
result *= n
return result
List = [1, 7, 3, 4]
N = len(List)
BaseList = [List[0:i] + List[i+1:N] for i in range(N)]
Products = [product(a) for a in BaseList]
print(Products)
From the input list List, you create a list of lists where the proper integer is removed in each one. Then you just build a new list with the products of those sublists.
Here's a solution using a recursive function instead of a loop:
from functools import reduce
def get_products_of_all_ints_except_at_index(int_list, n=0, results=[]):
new_list = int_list.copy()
if n == len(int_list):
return results
new_list.pop(n)
results.append(reduce((lambda x, y: x * y), new_list))
return get_products_of_all_ints_except_at_index(int_list, n+1, results)
int_list = [1, 7, 3, 4]
print(get_products_of_all_ints_except_at_index(int_list))
# expected results [84, 12, 28, 21]
Output:
[84, 12, 28, 21]
Assume N to be a power of 2.
Brute force takes N(N-2) products. You can improve on that by precomputing the N/2 products of pairs of elements, then N/4 pairs of pairs, then pairs of pairs of pairs, until you have a single pair left. This takes N-2 products in total.
Next, you form all the requested products by multiplying the required partial products, in a dichotomic way. This takes Lg(N)-1 multiplies per product, hence a total of N(Lg(N)-1) multiplies.
This is an O(N Log N) solution.
Illustration for N=8:
Using 6 multiplies,
a b c d e f g h
ab cd ef gh
abcd efgh
Then with 16 multiplies,
b.cd.efgh
a.cd.efgh
ab.d.efgh
ab.c.efgh
abcd.f.gh
abcd.e.gh
abcd.ef.h
abcd.ef.g
The desired expressions can be obtained from the binary structure of the numbers from 0 to N-1.
Brute force
Your brute force approach does not work for multiple reasons:
(I'm assuming at least fixed indentation)
List index out of range
for i in int_list
This already gives you i which is an element of the list, not an index. When i is 7,
int_list[i]
is not possible any more.
The loops should be for ... in range(len(int_list)).
With that fixed, the result contains too many elements. There are 12 elements in the result, but only 4 are expected. That's because of another indentation issue at products.append(...). It needs to be outdented 2 steps.
With that fixed, most k are overwritten by a new value each time i*j is calculated. To fix that, start k at the identity element for multiplication, which is 1, and then multiply int_list[j] onto it.
The full code is now
def get_products_of_all_ints_except_at_index(int_list):
products = []
for i in range(len(int_list)):
k = 1
for j in range(len(int_list)):
if i != j:
k *= int_list[j]
products.append(k)
return products
Optimization
I would propose the "brute force" solution as an answer first. Don't worry about performance as long as there are no performance requirements. That would be premature optimization.
The solution with division would be:
def get_products_of_all_ints_except_at_index(int_list):
products = []
product = 1
for i in int_list:
product *= i
for i in int_list:
products.append(product//i)
return products
and thus does not need a nested loop and has linear Big-O time complexity.
A little exponent trick may save you the division: simply multiply by the inverse:
for i in int_list:
products.append(int(product * i ** -1))

How to find last "K" indexes of vector satisfying condition (Python) ? (Analogue of Matlab's "find" )

Consider some vector:
import numpy as np
v = np.arange(10)
Assume we need to find last 2 indexes satisfying some condition.
For example in Matlab it would be written e.g.
find(v <5 , 2,'last')
answer = [ 3 , 4 ] (Note: Matlab indexing from 1)
Question: What would be the clearest way to do that in Python ?
"Nice" solution should STOP search when it finds 2 desired results, it should NOT search over all elements of vector.
So np.where does not seems to be "nice" in that sense.
We can easyly write that using "for", but is there any alternative way ?
I am afraid using "for" since it might be slow (at least it is very much so in Matlab).
This attempt doesn't use numpy, and it is probably not very idiomatic.
Nevertheless, if I understand it correctly, zip, filter and reversed are all lazy iterators that take only the elements that they really need. Therefore, you could try this:
x = list(range(10))
from itertools import islice
res = reversed(list(map(
lambda xi: xi[1],
islice(
filter(
lambda xi: xi[0] < 5,
zip(reversed(x), reversed(range(len(x))))
),
2
)
)))
print(list(res))
Output:
[3, 4]
What it does (from inside to outside):
create index range
reverse both array and indices
zip the reversed array with indices
filter the two (value, index)-pairs that you need, extract them by islice
Throw away the values, retain only indices with map
reverse again
Even though it looks somewhat monstrous, it should all be lazy, and stop after it finds the first two elements that you are looking for. I haven't compared it with a simple loop, maybe just using a loop would be both simpler and faster.
Any solution you'd find will iterate over the list even if the loop is 'hidden' inside a function.
The solution to your problem depends on the assumptions you can make e.g. is the list sorted?
for the general case I'd iterate over the loop starting at the end:
def find(condition, k, v):
indices = []
for i, var in enumerate(reversed(v)):
if condition(var):
indices.append(len(v) - i - 1)
if len(indices) >= k:
break
return indices
The condition should then be passed as a function, so you can use a lambda:
v = range(10)
find(lambda x: x < 5, 3, v)
will output
[4, 3, 2]
I'm not aware of a "good" numpy solution to short-circuiting.
The most principled way to go would be using something like Cython which to brutally oversimplify it adds fast loops to Python. Once you have set that up it would be easy.
If you do not want to do that you'd have to employ some gymnastics like:
import numpy as np
def find_last_k(vector, condition, k, minchunk=32):
if k > minchunk:
minchunk = k
l, r = vector.size - minchunk, vector.size
found = []
n_found = 0
while r > 0:
if l <= 0:
l = 0
found.append(l + np.where(condition(vector[l:r]))[0])
n_found += len(found[-1])
if n_found >= k:
break
l, r = 3 * l - 2 * r, l
return np.concatenate(found[::-1])[-k:]
This tries balancing loop overhead and numpy "inflexibility" by searching in chunks, which we grow exponentially until enough hits are found.
Not exactly pretty, though.
This is what I've found that seems to do this job for the example described (using argwhere which returns all indices that meet the criteria and then we find the last two of these as a numpy array):
ind = np.argwhere(v<5)
ind[-2:]
This searches through the entire array so is not optimal but is easy to code.

Categories