So I was reading the Wikipedia article on the Sieve of Eratosthenes and it included a Python implementation:
http://en.wikipedia.org/wiki/Sieve_of_Eratosthenes#Algorithm_complexity_and_implementation
def eratosthenes_sieve(n):
# Create a candidate list within which non-primes will be
# marked as None; only candidates below sqrt(n) need be checked.
candidates = range(n+1)
fin = int(n**0.5)
# Loop over the candidates, marking out each multiple.
for i in xrange(2, fin+1):
if not candidates[i]:
continue
candidates[2*i::i] = [None] * (n//i - 1)
# Filter out non-primes and return the list.
return [i for i in candidates[2:] if i]
It looks like a very simple and elegant implementation. I've seen other implementations, even in Python, and I understand how the Sieve works. But the particular way this implementation works, I"m getting a little confused. Seems whoever was writing that page was pretty clever.
I get that its iterating through the list, finding primes, and then marking multiples of primes as non-prime.
But what does this line do exactly:
candidates[2*i::i] = [None] * (n//i - 1)
I've figured out that its slicing candidates from 2*i to the end, iterating by i, so that means all multiples of i, start at 2*i, then go to 3*i, then go to 4*i till you finish the list.
But what does [None] * (n//i - 1) mean? Why not just set it to False?
Thanks. Kind of a specific question with a single answer, but I think this is the place to ask it. I would sure appreciate a clear explanation.
candidates[2*i::i] = [None] * (n//i - 1)
is just a terse way of writing
for j in range(2 * i, n, i):
candidates[j] = None
which works by assigning an list of Nones to a slice of candidates.
L * N creates and concatenates N (shallow) copies of L, so [None] * (n//i - 1) gives a list of ceil(n / i) times None. Slice assignment (L[start:end:step] = new_L) overwrites the items of the list the slice touches with the items of new_L.
You are right, one could set the items to False as well - I think this would be preferrable, the author of the code obviously thought None would be a better indicator of "crossed out". But None works as well, as bool(None) is False and .. if i is essentially if bool(i).
Related
Since the problem isn't new and there is a lot of algorithms that solve it I supposed that the question can be duplicating but I didn't find any.
There is a set of an elements. The task is to find is there a subset with the sum equal to some s variable.
Primitive solution is straightforward and can be solved in exponential time. DP recursive approach propose to add memoization to reduce complexity or working with a 2D array (bottom-up).
I found another one in the comment on geeksforgeeks but can't understand how it works.
def is_subset_sum(a, s):
n = len(a)
res = [False] * (s + 1)
res[0] = True
for j in range(n):
i = s
while i >= a[j]:
res[i] = res[i] or res[i - a[j]]
i -= 1
return(res[s])
Could someone please explain the algorithm? What an elements of the array is actually meaning? I'm trying to trace it but can't handle with it.
Putting words to the code: trying each element in the list in turn, set a temporary variable, i, to the target sum. While i is not smaller than the current element, a[j], the sum equal to the current value of i is either (1) already reachable and marked so, or (2) is reachable by adding the current element, a[j], to the sum equal to subtracting the current element from the current value of i, which we may have already marked. We thus enumerate all the possibilities in O(s * n) time and O(s) space. (i might be a poor choice for that variable name since it's probably most commonly seen representing an index rather than a sum. Although, in this case, the sums we are checking are themselves also indexes.)
For one of my programming questions, I am required to define a function that accepts two variables, a list of length l and an integer w. I then have to find the maximum sum of a sublist with length w within the list.
Conditions:
1<=w<=l<=100000
Each element in the list ranges from [1, 100]
Currently, my solution works in O(n^2) (correct me if I'm wrong, code attached below), which the autograder does not accept, since we are required to find an even simpler solution.
My code:
def find_best_location(w, lst):
best = 0
n = 0
while n <= len(lst) - w:
lists = lst[n: n + w]
cur = sum(lists)
best = cur if cur>best else best
n+=1
return best
If anyone is able to find a more efficient solution, please do let me know! Also if I computed my big-O notation wrongly do let me know as well!
Thanks in advance!
1) Find sum current of first w elements, assign it to best.
2) Starting from i = w: current = current + lst[i]-lst[i-w], best = max(best, current).
3) Done.
Your solution is indeed O(n^2) (or O(n*W) if you want a tighter bound)
You can do it in O(n) by creating an aux array sums, where:
sums[0] = l[0]
sums[i] = sums[i-1] + l[i]
Then, by iterating it and checking sums[i] - sums[i-W] you can find your solution in linear time
You can even calculate sums array on the fly to reduce space complexity, but if I were you, I'd start with it, and see if I can upgrade my solution next.
I have two lists of users (users1 and users2) and i am comparing them with the following code:
def lev(seq1, seq2):
oneago = None
thisrow = range(1, len(seq2) + 1) + [0]
for x in xrange(len(seq1)):
twoago, oneago, thisrow = oneago, thisrow, [0] * len(seq2) + [x + 1]
for y in xrange(len(seq2)):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len(seq2) - 1]
for x in users1_list:
for y in users2_list:
if 3 >= lev(x,y) > 1:
print x, "seems a lot like", y
Can i use list-comprehension to improve the nested for loop?
Can you use a list comprehension to improve the nested for loop?
In the lev function, I don't think so--at least not in the sense of "this is bad, and a list comprehension is the natural and direct thing that would clean it up."
Yes, you could use a list comprehension there, but several factors argue against comprehensions:
You're calculating a lot of things. This means there are many characters required for the resulting expressions (or subexpressions). It would be a very long comprehension expression, making quality formatting difficult and making it harder to hold all of the pieces in your head all at once.
You've nicely named the sub-expression components in ways that make logical sense. Spread out into multiple statements, the code is clear about how the deletion, addition, and substation costs are calculated. That's nice. It aids comprehension, esp. for you or someone else who comes back to this code after some time, and has to understand it all over again. If you shorten into a long expression to make a list comprehension neat, you'd remove the clarity of those subexpressions.
You do a lot of indexing. That is usually an anti-pattern / bad practice in Python, which has good "iterate over loop items" features. But there are algorithms--and this seems to be one of them--where indexing is the clear method of access. It's very consistent with what you will find in similar programs from other sources, or in reference materials. So using a more primitive indexing approach--something that often doesn't make sense in many Python contexts--works pretty well here.
In the second section, where you can loop over items not indices neatly, you do so. It's not like you're trying to avoid Pythonic constructs.
It does jump out at me that you're recalculating len(seq2) all the time, even though it seems to be a constant during this function. I'd calculate it once and reuse a stored value. And do you ever really use twoago? I didn't see it. So a revised snippet might be:
def lev(seq1, seq2):
oneago = None
len2 = len(seq2)
thisrow = range(1, len2 + 1) + [0]
for x in xrange(len(seq1)):
oneago, thisrow = thisrow, [0] * len2 + [x + 1]
for y in xrange(len2):
delcost = oneago[y] + 1
addcost = thisrow[y - 1] + 1
subcost = oneago[y - 1] + (seq1[x] != seq2[y])
thisrow[y] = min(delcost, addcost, subcost)
return thisrow[len2 - 1]
Finally, stackoverflow tends to be problem related. It has a sister site codereview that might be more appropriate for detailed code improvement suggestions (much as programmers is better for more theoretical programming questions).
>>> list1 = ['Bret', 'Jermaine', 'Murray']
>>> list2 = ['Jermaine', 'Murray', 'Mel']
If the entries in the lists are unique, it might make sense to convert them into sets. You could then see which things are common:
>>> set(list1).intersection(set(list2))
{'Jermaine', 'Murray'}
The union of both sets can be returned:
>>> set(list1).union(set(list2))
{'Bret', 'Jermaine', 'Mel', 'Murray'}
To measure the commonality between the two sets, you could calculate the Jaccard index (see http://en.wikipedia.org/wiki/Jaccard_index for more details):
>>> len(set(list1).intersection(set(list2))) / float(len(set(list1).union(set(list2))))
0.5
This the number of common elements divided by the total number of elements.
Just doing a review of my Python class and noticed that I forgot how to do this.
def outsideIn2(lst):
'''(list)->list
Returns a new list where the middle two elements have been
removed and placed at the beginning of the result. Assume all lists are an even
length
>>> outsideIn2(['C','a','r','t','o','n'])
['r','t','C','a','o','n'] # rt moves to front
>>> outsideIn2(['H','i'])
['H','i'] # Hi moves to front so output remains the same.
>>> outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])
['r','a','B','a','r','b,','a',' ','A','n','n','e'] # ra moves to front.
'''
length = len(lst)
middle1 = lst.pop((len(lst) / 2) - 1)
middle2 = lst.pop((len(lst) / 2) + 1)
lst.insert([0], middle1)
lst.insert([1], middle2)
return lst
I'm getting this error:
middle1 = lst.pop((len(lst) / 2) - 1)
TypeError: integer argument expected, got float
What am I doing wrong?
When you upgraded to Python 3, the '/' operator changed from giving you integer division to real division. Switch to '//' operator.
You can use // operator:
middle1 = lst.pop((len(lst) // 2) - 1)
The other answers explained why you are getting the error. You need to use // instead of / (also, just for the record, you need to give list.insert integers, not lists).
However, I'd like to suggest a different approach that uses Explain Python's slice notation:
def outsideIn2(lst):
x = len(lst)//2
return lst[x-1:x+1]+lst[:x-1]+lst[x+1:]
This method should be significantly faster than usinglist.pop and list.insert.
As proof, I made the below script to compare the two methods with timeit.timeit:
from timeit import timeit
def outsideIn2(lst):
length = len(lst)
middle1 = lst.pop((len(lst) // 2) - 1)
middle2 = lst.pop((len(lst) // 2) + 1)
lst.insert(0, middle1)
lst.insert(1, middle2)
return lst
print(timeit("outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])", "from __main__ import outsideIn2"))
def outsideIn2(lst):
x = len(lst)//2
return lst[x-1:x+1]+lst[:x-1]+lst[x+1:]
print(timeit("outsideIn2(['B','a','r','b','a','r','a',' ','A','n','n','e'])", "from __main__ import outsideIn2"))
The results were as follows:
6.255111473664949
4.465956427423038
As you can see, my proposed method was ~2 seconds faster. However, you can run more tests if you would like to validate mine.
Using pop and insert (especially inserting at positions 0 and 1) can be fairly slow with Python lists. Since the underlying storage for the list is an array, inserting at position 0 means that the element at position n-1 has to be moved to position n, then the element at n-2 has to be moved to n-1 and so on. pop has to do the same in reverse. So imagine in your little method how many element moves must be done. Roughly:
pop #1 - move n/2 elements
pop #2 - move n/2 elements
insert 0 - move n elements
insert 1 - move n elements
So approximately 3n moves are done in this code.
Breaking the list into 3 slices and reassembling a new list may be more optimal:
def outsideIn2(lst):
midstart = len(lst)//2-1
left,mid,right = lst[0:midstart], lst[midstart:midstart+2], lst[midstart+2:]
return mid+left+right
Plus you won't run into any weird issues by pop changing the length of the list between the first and second call to pop. And the slices implicitly guard against index errors when you get a list that is shorter than 2 characters.
Maybe it is a stupid question, but i was wondering if you could provide the shortest source to find prime numbers with Python.
I was also wondering how to find prime numbers by using map() or filter() functions.
Thank you (:
EDIT: When i say fastest/shortest I mean the way with the less characters/words. Do not consider a competition, anyway: i was wondering if it was possible a one line source, without removing indentation always used with for cycles.
EDIT 2:The problem was not thought for huge numbers. I think we can stay under a million( range(2,1000000)
EDIT 3: Shortest, but still elegant. As i said in the first EDIT, you don't need to reduce variables' names to single letters. I just need a one line, elegant source.
Thank you!
The Sieve of Eratosthenes in two lines.
primes = set(range(2,1000000))
for n in [2]+range(3,1000000/2,2): primes -= set(range(2*n,1000000,n))
Edit: I've realized that the above is not a true Sieve of Eratosthenes because it filters on the set of odd numbers rather than the set of primes, making it unnecessarily slow. I've fixed that in the following version, and also included a number of common optimizations as pointed out in the comments.
primes = set([2] + range(3, 1000000, 2))
for n in range(3, int(1000000**0.5)+1, 2): primes -= set(range(n*n,1000000,2*n) if n in primes else [])
The first version is still shorter and does generate the proper result, even if it takes longer.
Since one can just cut and paste the first million primes from the net:
map(int,open('primes.txt'))
This is a somewhat similar to the question I asked yesterday where wim provided a fairly short answer:
is this primes generator pythonic
Similar to the above, but not as cheeky as Robert King's answer:
from itertools import ifilter, imap
def primes(max=3000):
r = set(); [r.add(n) for n in ifilter(lambda c: all(imap(c.__mod__, r)), xrange(2, max+1))]; return sorted(r)
This uses more characters, but it's readable:
def primes_to(n):
cands = set(xrange(2, n))
for i in xrange(2, int(n ** 0.5) + 1):
for ix in xrange(i ** 2, n, i):
cands.discard(ix)
return list(cands)
EDIT
A new way, similar to the above, but with less missed attempts at discard:
def primes_to(n):
cands = set(xrange(3, n, 2))
for i in xrange(3, int(n ** 0.5) + 1, 2):
for ix in xrange(i ** 2, n, i * 2):
cands.discard(ix)
return [2] + list(cands)